You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: mkdocs/docs/api.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -394,11 +394,11 @@ Within each file, batch ordering always follows row order. The `limit` parameter
394
394
| Use case | Recommended config |
395
395
|---|---|
396
396
| Small tables, simple queries | Default — no extra args needed |
397
-
| Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
398
-
| Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
397
+
| Large tables, memory-constrained | `order=ScanOrder.ARRIVAL` — one file at a time, minimal memory |
398
+
| Maximum throughput with bounded memory | `order=ScanOrder.ARRIVAL, concurrent_files=N` — tune N to balance throughput vs memory |
399
399
| Fine-grained batch control | Add `batch_size=N` to any of the above |
400
400
401
-
**Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
401
+
**Note:** `ScanOrder.ARRIVAL` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default `ScanOrder.TASK` mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
402
402
403
403
To avoid any type inconsistencies during writing, you can convert the Iceberg table schema to Arrow:
0 commit comments