Skip to content

Commit 3ef3eb2

Browse files
sumedhsakdeoclaude
andcommitted
docs: add configuration guidance table to streaming API docs
Add a "which config should I use?" tip box with recommended starting points for common use cases, and clarify that batch_size is an advanced tuning knob. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 186b893 commit 3ef3eb2

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

mkdocs/docs/api.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -385,6 +385,17 @@ for buf in tbl.scan().to_arrow_batch_reader(streaming=True, concurrent_files=4,
385385

386386
Within each file, batch ordering always follows row order. The `limit` parameter is enforced correctly regardless of configuration.
387387

388+
!!! tip "Which configuration should I use?"
389+
390+
| Use case | Recommended config |
391+
|---|---|
392+
| Small tables, simple queries | Default — no extra args needed |
393+
| Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
394+
| Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
395+
| Fine-grained batch control | Add `batch_size=N` to any of the above |
396+
397+
**Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
398+
388399
To avoid any type inconsistencies during writing, you can convert the Iceberg table schema to Arrow:
389400

390401
```python

0 commit comments

Comments
 (0)