Skip to content

Commit 31c76dc

Browse files
committed
docs: update documentation for DataFrame iteration and RecordBatch usage
1 parent 98e7e00 commit 31c76dc

File tree

2 files changed

+9
-18
lines changed

2 files changed

+9
-18
lines changed

docs/source/user-guide/dataframe/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,13 +131,13 @@ Terminal Operations
131131
-------------------
132132

133133
To materialize the results of your DataFrame operations, call a terminal method or iterate over the
134-
``DataFrame`` to consume :py:class:`datafusion.record_batch.RecordBatch` objects lazily:
134+
``DataFrame`` to consume :py:class:`~datafusion.record_batch.RecordBatch` objects lazily:
135135

136136
.. code-block:: python
137137
138138
# Iterate over the DataFrame to stream record batches
139139
for batch in df:
140-
... # batch is a datafusion.record_batch.RecordBatch
140+
... # each batch is a RecordBatch (use batch.to_pyarrow() for PyArrow)
141141
142142
# Collect all data as PyArrow RecordBatches
143143
result_batches = df.collect()

python/datafusion/dataframe.py

Lines changed: 7 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ class DataFrame:
295295
up the plan without executing it, and results are only materialized during a
296296
terminal operation (for example, :py:meth:`collect`, :py:meth:`show`, or
297297
:py:meth:`to_pandas`) or when iterating over the DataFrame, which yields
298-
:class:`pyarrow.RecordBatch` objects lazily.
298+
:py:class:`~datafusion.record_batch.RecordBatch` objects lazily.
299299
300300
See :ref:`user_guide_concepts` in the online documentation for more information.
301301
"""
@@ -1134,25 +1134,16 @@ def __arrow_c_stream__(self, requested_schema: object | None = None) -> object:
11341134
return self.df.__arrow_c_stream__(requested_schema)
11351135

11361136
def __iter__(self) -> Iterator[RecordBatch]:
1137-
"""Yield :class:`datafusion.record_batch.RecordBatch` objects lazily.
1137+
"""Yield record batches from this DataFrame lazily.
11381138
1139-
This delegates to :py:meth:`to_stream` without converting each batch to a
1140-
:class:`pyarrow.RecordBatch`. Use
1141-
:py:meth:`datafusion.record_batch.RecordBatch.to_pyarrow` when a
1142-
:class:`pyarrow.RecordBatch` is required.
1139+
This delegates to :py:meth:`to_stream` without eagerly materializing the
1140+
entire result set.
11431141
"""
1144-
for batch in self.to_stream():
1145-
yield batch
1142+
return iter(self.to_stream())
11461143

11471144
def __aiter__(self) -> AsyncIterator[RecordBatch]:
1148-
"""Asynchronously yield :class:`datafusion.record_batch.RecordBatch` objects lazily."""
1149-
stream = self.to_stream()
1150-
1151-
async def iterator() -> AsyncIterator[RecordBatch]:
1152-
async for batch in stream:
1153-
yield batch
1154-
1155-
return iterator()
1145+
"""Asynchronously yield record batches from this DataFrame lazily."""
1146+
return self.to_stream()
11561147

11571148
def transform(self, func: Callable[..., DataFrame], *args: Any) -> DataFrame:
11581149
"""Apply a function to the current DataFrame which returns another DataFrame.

0 commit comments

Comments
 (0)