You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix: Support nested struct field filtering with PyArrow (#953)
Fixes filtering on nested struct fields when using PyArrow for scan operations.
## Problem
When filtering on nested struct fields (e.g., `mazeMetadata.run_id == 'value'`),
PyArrow would fail with:
```
ArrowInvalid: No match for FieldRef.Name(run_id) in ...
```
The issue occurred because PyArrow requires nested field references as tuples
(e.g., `("parent", "child")`) rather than dotted strings (e.g., `"parent.child"`).
## Solution
1. Modified `_ConvertToArrowExpression` to accept an optional `Schema` parameter
2. Added `_get_field_name()` method that converts dotted field paths to tuples
for nested struct fields
3. Updated `expression_to_pyarrow()` to accept and pass the schema parameter
4. Updated all call sites to pass the schema when available
## Changes
- `pyiceberg/io/pyarrow.py`:
- Modified `_ConvertToArrowExpression` class to handle nested field paths
- Updated `expression_to_pyarrow()` signature to accept schema
- Updated `_expression_to_complementary_pyarrow()` signature
- `pyiceberg/table/__init__.py`:
- Updated call to `_expression_to_complementary_pyarrow()` to pass schema
- Tests:
- Added `test_ref_binding_nested_struct_field()` for comprehensive nested field testing
- Enhanced `test_nested_fields()` with issue #953 scenarios
## Example
```python
# Now works correctly:
table.scan(row_filter="mazeMetadata.run_id == 'abc123'").to_polars()
```
The fix converts the field reference from:
- ❌ `FieldRef.Name(run_id)` (fails - field not found)
- ✅ `FieldRef.Nested(FieldRef.Name(mazeMetadata) FieldRef.Name(run_id))` (works!)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
0 commit comments