Skip to content

Commit 1da7eb6

Browse files
sumedhsakdeoclaude
andcommitted
chore: remove benchmark marker so tests run in CI
Remove @pytest.mark.benchmark so the read throughput tests are included in the default `make test` filter as parametrize-marked tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 3ef3eb2 commit 1da7eb6

File tree

2 files changed

+9
-10
lines changed

2 files changed

+9
-10
lines changed

mkdocs/docs/api.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -385,16 +385,16 @@ for buf in tbl.scan().to_arrow_batch_reader(streaming=True, concurrent_files=4,
385385

386386
Within each file, batch ordering always follows row order. The `limit` parameter is enforced correctly regardless of configuration.
387387

388-
!!! tip "Which configuration should I use?"
388+
**Which configuration should I use?**
389389

390-
| Use case | Recommended config |
391-
|---|---|
392-
| Small tables, simple queries | Default — no extra args needed |
393-
| Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
394-
| Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
395-
| Fine-grained batch control | Add `batch_size=N` to any of the above |
390+
| Use case | Recommended config |
391+
|---|---|
392+
| Small tables, simple queries | Default — no extra args needed |
393+
| Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
394+
| Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
395+
| Fine-grained batch control | Add `batch_size=N` to any of the above |
396396

397-
**Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
397+
**Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
398398

399399
To avoid any type inconsistencies during writing, you can convert the Iceberg table schema to Arrow:
400400

tests/benchmark/test_read_benchmark.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
Memory is measured using pa.total_allocated_bytes() which tracks PyArrow's C++
2323
memory pool (Arrow buffers, Parquet decompression), not Python heap allocations.
2424
25-
Run with: uv run pytest tests/benchmark/test_read_benchmark.py -v -s -m benchmark
25+
Run with: uv run pytest tests/benchmark/test_read_benchmark.py -v -s
2626
"""
2727

2828
import gc
@@ -84,7 +84,6 @@ def benchmark_table(tmp_path_factory: pytest.TempPathFactory) -> Table:
8484
return table
8585

8686

87-
@pytest.mark.benchmark
8887
@pytest.mark.parametrize(
8988
"streaming,concurrent_files,batch_size",
9089
[

0 commit comments

Comments
 (0)