docs: add configuration guidance table to streaming API docs

sumedhsakdeo · claude · sumedhsakdeo · commit 3ef3eb2f84cc · 2026-02-16T19:22:32.000-08:00
Add a "which config should I use?" tip box with recommended starting
points for common use cases, and clarify that batch_size is an advanced
tuning knob.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/mkdocs/docs/api.md b/mkdocs/docs/api.md
@@ -385,6 +385,17 @@ for buf in tbl.scan().to_arrow_batch_reader(streaming=True, concurrent_files=4,
 
 Within each file, batch ordering always follows row order. The `limit` parameter is enforced correctly regardless of configuration.
 
+!!! tip "Which configuration should I use?"
+
+    | Use case | Recommended config |
+    |---|---|
+    | Small tables, simple queries | Default — no extra args needed |
+    | Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
+    | Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
+    | Fine-grained batch control | Add `batch_size=N` to any of the above |
+
+    **Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
+
 To avoid any type inconsistencies during writing, you can convert the Iceberg table schema to Arrow:
 
 ```python