Skip to content

Error when filtering by UUID in table scan #2372

@sevakva

Description

@sevakva

Apache Iceberg version

main (development)

Please describe the bug 🐞

Problem
Getting a pyarrow.lib.ArrowNotImplementedError: Function 'equal' has no kernel matching input types (extension<arrow.uuid>, extension<arrow.uuid>) when trying to scan a PyIceberg table with a row filter using UUID comparison. The error indicates that PyArrow's equal function doesn't have a kernel for comparing UUID extension types.

Environment
pyiceberg: Nightly build (expected to support UUIDs)
pyarrow: 21.0.0
Python: 3.13

Code to Reproduce

import uuid
from pyiceberg.expressions import EqualTo

# This fails with ArrowNotImplementedError
df = table.scan(row_filter=EqualTo("batch_id", uuid.UUID("0190de80-647f-4bbc-a80e-efda686b910f")))

Full Error Stack Trace

  File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/.venv/lib/python3.13/site-packages/pyiceberg/io/pyarrow.py", line 1694, in batches_for_task
    return list(self._record_batches_from_scan_tasks_and_deletes([task], deletes_per_file))
  File "/Users/.venv/lib/python3.13/site-packages/pyiceberg/io/pyarrow.py", line 1732, in _record_batches_from_scan_tasks_and_deletes
    for batch in batches:
                 ^^^^^^^
  File "/Users/.venv/lib/python3.13/site-packages/pyiceberg/io/pyarrow.py", line 1518, in _task_to_record_batches
    fragment_scanner = ds.Scanner.from_fragment(
        fragment=fragment,
    ...<4 lines>...
        columns=[col.name for col in file_project_schema.columns],
    )
  File "pyarrow/_dataset.pyx", line 3692, in pyarrow._dataset.Scanner.from_fragment
  File "pyarrow/_dataset.pyx", line 3458, in pyarrow._dataset._populate_builder
  File "pyarrow/_compute.pyx", line 2732, in pyarrow._compute._bind
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'equal' has no kernel matching input types (extension<arrow.uuid>, extension<arrow.uuid>)

Expected Behavior
The table scan should successfully filter rows by UUID without throwing a kernel matching error.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions