Clarify behavior of __arrow_c_stream__ execution, emphasizing incremental batch processing and memory efficiency

kosiew · kosiew · commit 0ff4c0de3222 · 2025-09-13T17:56:01.000+08:00
diff --git a/docs/source/user-guide/io/arrow.rst b/docs/source/user-guide/io/arrow.rst
@@ -60,10 +60,16 @@ Exporting from DataFusion
 DataFusion DataFrames implement ``__arrow_c_stream__`` PyCapsule interface, so any
 Python library that accepts these can import a DataFusion DataFrame directly.
 
-.. warning::
-    It is important to note that this will cause the DataFrame execution to happen, which may be
-    a time consuming task. That is, you will cause a
-    :py:func:`datafusion.dataframe.DataFrame.collect` operation call to occur.
+.. note::
+    Invoking ``__arrow_c_stream__`` still triggers execution of the underlying
+    query, but batches are yielded incrementally rather than materialized all at
+    once in memory. Consumers can process the stream as it arrives, avoiding the
+    memory overhead of a full
+    :py:func:`datafusion.dataframe.DataFrame.collect`.
+
+    For an example of this streamed execution and its memory safety, see the
+    ``test_arrow_c_stream_large_dataset`` unit test in
+    :mod:`python.tests.test_io`.
 
 
 .. ipython:: python