Question
import pyarrow as pa
# table created with the below pyarrow schema
schema = pa.schema(
[
pa.field("col1", pa.string(), nullable=True),
]
)
df = pa.Table.from_pylist(
[
{"col1": None}
]
)
table.overwrite(df)
In the above example, we encounter an error like this UnsupportedPyArrowTypeException: Column 'col1' has an unsupported type: null, with underlying cause
in _ConvertToIceberg.primitive(self, primitive)
1211 return FixedType(primitive.byte_width)
-> 1213 raise TypeError(f"Unsupported type: {primitive}")
TypeError: Unsupported type: null
Is there any reason we wouldn't want to support the case where pyarrow has marked a Field as null? As a workaround/fix, I was thinking that we could exclude pa.null() Fields in visit_pyarrow(obj: pa.StructType, visitor: PyArrowSchemaVisitor[T]). This way, the column would effectively be missing and any required/nullable enforcement would be performed accordingly. Would this have any undesired consequences?