Skip to content

Commit 17c4c2c

Browse files
committed
Enhance documentation for DataFrame streaming API, clarifying schema handling and limitations
1 parent 2794c88 commit 17c4c2c

File tree

1 file changed

+17
-5
lines changed

1 file changed

+17
-5
lines changed

python/datafusion/dataframe.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1114,15 +1114,27 @@ def __arrow_c_stream__(self, requested_schema: object | None = None) -> object:
11141114
11151115
The DataFrame is executed using DataFusion's streaming APIs and exposed via
11161116
Arrow's C Stream interface. Record batches are produced incrementally, so the
1117-
full result set is never materialized in memory. When ``requested_schema`` is
1118-
provided, only straightforward projections such as column selection or
1119-
reordering are applied.
1117+
full result set is never materialized in memory.
1118+
1119+
When ``requested_schema`` is provided, DataFusion applies only simple
1120+
projections such as selecting a subset of existing columns or reordering
1121+
them. Column renaming, computed expressions, or type coercion are not
1122+
supported through this interface.
11201123
11211124
Args:
1122-
requested_schema: Attempt to provide the DataFrame using this schema.
1125+
requested_schema: Either a :py:class:`pyarrow.Schema` or an Arrow C
1126+
Schema capsule (``PyCapsule``) produced by
1127+
``schema._export_to_c_capsule()``. The DataFrame will attempt to
1128+
align its output with the fields and order specified by this schema.
11231129
11241130
Returns:
1125-
Arrow PyCapsule object representing an ``ArrowArrayStream``.
1131+
Arrow ``PyCapsule`` object representing an ``ArrowArrayStream``.
1132+
1133+
Examples:
1134+
>>> schema = df.schema()
1135+
>>> stream = df.__arrow_c_stream__(schema)
1136+
>>> capsule = schema._export_to_c_capsule()
1137+
>>> stream = df.__arrow_c_stream__(capsule)
11261138
"""
11271139
# ``DataFrame.__arrow_c_stream__`` in the Rust extension leverages
11281140
# ``execute_stream_partitioned`` under the hood to stream batches while

0 commit comments

Comments
 (0)