feat(arrow): Support reading nested parquet columns #2001

sundy-li · 2026-01-06T09:19:18Z

Which issue does this PR close?

Closes #.

What changes are included in this PR?

This PR enables projection of nested fields within struct columns when reading parquet files. Previously, selecting a field nested inside a struct would result in a FeatureUnsupported error.

Problem

When users try to select nested fields like person.name from a schema such as:

id: Int (field_id=1)
person: Struct (field_id=2)
  name: String (field_id=3)
  age: Int (field_id=4)

The scan would fail with "Projecting nested field is not supported now" error, blocking access to nested column data.

Solution

1. crates/iceberg/src/arrow/reader.rs

Add RecordBatchProjector integration to detect and handle nested field projections
After parquet projection, detect if any requested field IDs are nested (not direct children of the schema's top-level struct)
Create a RecordBatchProjector to extract nested fields from their parent structs, flattening them into the output record batch
Exclude metadata fields (like _file) from nested field detection

2. crates/iceberg/src/arrow/record_batch_transformer.rs

Extend build_field_id_to_arrow_schema_map to recursively index nested struct fields
Add helper function collect_field_ids_recursive to traverse the field hierarchy
This allows the transformer to find field IDs that are nested within structs

3. crates/iceberg/src/scan/mod.rs

Remove the restriction that blocked nested field selection (the FeatureUnsupported error)

How it works

When processing a FileScanTask, detect if any requested field IDs are nested by checking if schema.as_struct().field_by_id(id) returns None
If nested fields are detected, create a RecordBatchProjector with the projected arrow schema
The projector builds index paths to locate nested fields (e.g., [1, 0] means column 1, inner field 0)
After parquet reads the data, the projector extracts nested fields from their parent structs
The transformer then processes the flattened batch normally

Are these changes tested?

Yes, added test_read_nested_parquet_column test that:

Creates a parquet file with nested struct data (id, person { name, age })
Reads with projection [1, 3] (selecting id and nested name)
Verifies both the top-level field and nested field are correctly extracted
All 1051 existing tests continue to pass

Enable projection of nested fields within struct columns when reading parquet files. Previously, selecting a field nested inside a struct would result in a FeatureUnsupported error. Changes: - Add RecordBatchProjector integration to extract nested fields from struct columns after parquet projection - Extend RecordBatchTransformer's field ID mapping to recursively index nested struct fields - Remove the nested field restriction in scan module that blocked nested field selection The implementation detects when requested field IDs are nested (not direct children of the schema) and creates a RecordBatchProjector to extract those fields from their parent structs, flattening them into the output record batch.

mbutrovich · 2026-01-08T15:32:48Z

I'll try to take a look at this in the next few days. It's interesting to me since Comet already generates FileScanTasks that manage to read nested types with the current reader, so I want to understand the scope of changes there.

mbutrovich self-requested a review January 8, 2026 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(arrow): Support reading nested parquet columns #2001

feat(arrow): Support reading nested parquet columns #2001

Uh oh!

sundy-li commented Jan 6, 2026

Uh oh!

mbutrovich commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(arrow): Support reading nested parquet columns #2001

Are you sure you want to change the base?

feat(arrow): Support reading nested parquet columns #2001

Uh oh!

Conversation

sundy-li commented Jan 6, 2026

Which issue does this PR close?

What changes are included in this PR?

Problem

Solution

How it works

Are these changes tested?

Uh oh!

mbutrovich commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants