Skip to content

Add Server-Side Scan Planning Support #2775

@geruh

Description

@geruh

Feature Request / Improvement

Now that Java has added server-side scan planning support PR #14480, I believe Python is a great place to integrate this functionality! We have all the building blocks and they are almost brought to completion. I'm creating this issue to track all of the tasks we need to drive it through

Context

We have some open PRs with some needed model changes, but we have pivoted to using our existing models and ensuring they're properly serializable with pydantic.

For example, initially we can work on:

  1. Expression Serialization to ensure BooleanExpression and subclasses serialize correctly for REST API. Related to @Fokko's work on Remove Generic from expressions #2750 and @rambleraptor Server-side planning models #2435.

  2. DataFile Serialization to ensure we can properly deserialize the data/deletefiles from the server response. Open API uses kebab-case (file-format, file-path) but our models expect snake_case (file_format, file_path).

REST API Endpoints to Implement

Based on the Iceberg REST spec:

  1. POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan - Submit scan for planning
  2. GET /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id} - Fetch planning result
  3. DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id} - Cancel planning
  4. POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/tasks - Fetch scan tasks for a plan task

Tasks

Initially we can start with core sync planning and once that's in place we can add the async support as it looks like it exists in https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java

Core Sync Planning

  • Build on @Fokko's expression work to ensure Expression classes serialize properly with Pydantic.
  • Construct only plan Request/Response models for synchronous planning
  • Add support for REST scan to fetch scan tasks replicating DataScan behavior
  • Parse server response to FileScanTask objects (handle Data/DeleteFile construction)
  • Add plan_table_scan() methods
  • Add documentation

Full Scan planning support (Follow-up)

Complete the full scan planning API with async operations and pagination.

  • Add the rest of the models for async planning
  • Add support for endpoint 2 & 3 to RESTScan
  • Add endpoint 4 support to RestCatalog
  • Complete documentation with all scenarios

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions