@@ -25,8 +25,9 @@ The ``DataFrame`` class is the core abstraction in DataFusion that represents ta
2525on that data. DataFrames provide a flexible API for transforming data through various operations such as
2626filtering, projection, aggregation, joining, and more.
2727
28- A DataFrame represents a logical plan that is lazily evaluated. The actual execution occurs only when
29- terminal operations like ``collect() ``, ``show() ``, or ``to_pandas() `` are called.
28+ A DataFrame represents a lazily evaluated logical plan. No computation occurs until you perform a
29+ terminal operation (such as ``collect() ``, ``show() ``, or ``to_pandas() ``) or iterate over the
30+ ``DataFrame ``.
3031
3132Creating DataFrames
3233-------------------
@@ -129,20 +130,25 @@ DataFusion's DataFrame API offers a wide range of operations:
129130 Terminal Operations
130131-------------------
131132
132- To materialize the results of your DataFrame operations:
133+ To materialize the results of your DataFrame operations, call a terminal method or iterate over the
134+ ``DataFrame `` to consume ``pyarrow.RecordBatch `` objects lazily:
133135
134136.. code-block :: python
135137
138+ # Iterate over the DataFrame to stream record batches
139+ for batch in df:
140+ ... # process each batch as it is produced
141+
136142 # Collect all data as PyArrow RecordBatches
137143 result_batches = df.collect()
138-
144+
139145 # Convert to various formats
140146 pandas_df = df.to_pandas() # Pandas DataFrame
141147 polars_df = df.to_polars() # Polars DataFrame
142148 arrow_table = df.to_arrow_table() # PyArrow Table
143149 py_dict = df.to_pydict() # Python dictionary
144150 py_list = df.to_pylist() # Python list of dictionaries
145-
151+
146152 # Display results
147153 df.show() # Print tabular format to console
148154
@@ -154,10 +160,9 @@ PyArrow Streaming
154160
155161DataFusion DataFrames implement the ``__arrow_c_stream__ `` protocol, enabling
156162zero-copy streaming into libraries like `PyArrow <https://arrow.apache.org/ >`_.
157- Earlier versions eagerly converted the entire DataFrame when exporting to
158- PyArrow, which could exhaust memory on large datasets. With streaming, batches
159- are produced lazily so you can process arbitrarily large results without
160- out-of-memory errors.
163+ Because DataFrames are lazily evaluated, batches are produced only as they are
164+ consumed so you can process arbitrarily large results without out-of-memory
165+ errors.
161166
162167.. code-block :: python
163168
0 commit comments