You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Remove @pytest.mark.benchmark so the read throughput tests are included
in the default `make test` filter as parametrize-marked tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: mkdocs/docs/api.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -385,16 +385,16 @@ for buf in tbl.scan().to_arrow_batch_reader(streaming=True, concurrent_files=4,
385
385
386
386
Within each file, batch ordering always follows row order. The `limit` parameter is enforced correctly regardless of configuration.
387
387
388
-
!!! tip "Which configuration should I use?"
388
+
**Which configuration should I use?**
389
389
390
-
| Use case | Recommended config |
391
-
|---|---|
392
-
| Small tables, simple queries | Default — no extra args needed |
393
-
| Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
394
-
| Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
395
-
| Fine-grained batch control | Add `batch_size=N` to any of the above |
390
+
| Use case | Recommended config |
391
+
|---|---|
392
+
| Small tables, simple queries | Default — no extra args needed |
393
+
| Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
394
+
| Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
395
+
| Fine-grained batch control | Add `batch_size=N` to any of the above |
396
396
397
-
**Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
397
+
**Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
398
398
399
399
To avoid any type inconsistencies during writing, you can convert the Iceberg table schema to Arrow:
0 commit comments