-
Notifications
You must be signed in to change notification settings - Fork 414
Closed
Description
Apache Iceberg version
None
Please describe the bug 🐞
See:
➜ iceberg-python git:(fd-align-codestyle) ipython
Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 (clang-1500.3.9.4)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.31.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pyarrow as pa
...:
...: arrow_schema = pa.schema(
...: [
...: pa.field("city", pa.string(), nullable=False),
...: pa.field("tags", pa.list_(pa.string()), nullable=False),
...: ]
...: )
...:
...: # Write some data
...: df = pa.Table.from_pylist(
...: [
...: {"city": "Amsterdam", "tags": ["Europe", "Capital"]},
...: {"city": "San Francisco", "tags": ["Amsterdam", "Golden Gate"]},
...: ],
...: schema=arrow_schema,
...: )
...: joined = df.join(df, "city", join_type="inner")
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
Cell In[1], line 18
10 # Write some data
11 df = pa.Table.from_pylist(
12 [
13 {"city": "Amsterdam", "tags": ["Europe", "Capital"]},
(...)
16 schema=arrow_schema,
17 )
---> 18 joined = df.join(df, "city", join_type="inner")
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/table.pxi:5704, in pyarrow.lib.Table.join()
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/acero.py:249, in _perform_join(join_type, left_operand, left_keys, right_operand, right_keys, left_suffix, right_suffix, use_threads, coalesce_keys, output_type)
244 projection = Declaration(
245 "project", ProjectNodeOptions(projections, projected_col_names)
246 )
247 decl = Declaration.from_sequence([decl, projection])
--> 249 result_table = decl.to_table(use_threads=use_threads)
251 if output_type == Table:
252 return result_table
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/_acero.pyx:590, in pyarrow._acero.Declaration.to_table()
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()
File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()
ArrowInvalid: Data type list<item: string> is not supported in join non-key field tags
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
No labels