Skip to content

Commit ecf72d1

Browse files
committed
Some changes as per review comments
1 parent d8f9411 commit ecf72d1

File tree

2 files changed

+30
-12
lines changed

2 files changed

+30
-12
lines changed

mkdocs/docs/recipe-count.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,24 @@ The count operation is highly efficient because:
5757
- **Filter pushdown**: Eliminates files that don't match criteria
5858
- **Cached statistics**: Utilizes pre-computed record counts
5959

60+
!!! tip "Even Faster: Use Snapshot Properties"
61+
62+
For the fastest possible total row count (without filters), you can access the cached count directly from snapshot properties, avoiding any table scanning:
63+
64+
```python
65+
# Get total records from snapshot metadata (fastest method)
66+
total_records = table.current_snapshot().summary.additional_properties["total-records"]
67+
print(f"Total rows from snapshot: {total_records}")
68+
```
69+
70+
**When to use this approach:**
71+
- When you need the total table row count without any filters
72+
- For dashboard queries that need instant response times
73+
- When working with very large tables where even metadata scanning takes time
74+
- For monitoring and alerting systems that check table sizes frequently
75+
76+
**Note:** This method only works for total counts. For filtered counts, use `table.scan().filter(...).count()`.
77+
6078
## Test Scenarios
6179

6280
Our test suite validates count behavior across different scenarios:

tests/table/test_count.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -59,17 +59,17 @@ def test_count_basic():
5959
- The count() method aggregates these counts efficiently
6060
"""
6161
# Create a mock table with the necessary attributes
62-
table = Mock(spec=DataScan)
62+
scan = Mock(spec=DataScan)
6363

6464
# Mock the plan_files method to return our dummy task
6565
task = DummyTask(42, residual=AlwaysTrue(), delete_files=[])
66-
table.plan_files = MagicMock(return_value=[task])
66+
scan.plan_files = MagicMock(return_value=[task])
6767

6868
# Import and call the actual count method
6969
from pyiceberg.table import DataScan as ActualDataScan
70-
table.count = ActualDataScan.count.__get__(table, ActualDataScan)
70+
scan.count = ActualDataScan.count.__get__(scan, ActualDataScan)
7171

72-
assert table.count() == 42
72+
assert scan.count() == 42
7373

7474

7575
def test_count_empty():
@@ -86,16 +86,16 @@ def test_count_empty():
8686
- Tables with restrictive filters that match no data
8787
"""
8888
# Create a mock table with the necessary attributes
89-
table = Mock(spec=DataScan)
89+
scan = Mock(spec=DataScan)
9090

9191
# Mock the plan_files method to return no tasks
92-
table.plan_files = MagicMock(return_value=[])
92+
scan.plan_files = MagicMock(return_value=[])
9393

9494
# Import and call the actual count method
9595
from pyiceberg.table import DataScan as ActualDataScan
96-
table.count = ActualDataScan.count.__get__(table, ActualDataScan)
96+
scan.count = ActualDataScan.count.__get__(scan, ActualDataScan)
9797

98-
assert table.count() == 0
98+
assert scan.count() == 0
9999

100100

101101
def test_count_large():
@@ -113,17 +113,17 @@ def test_count_large():
113113
- Distributed data scenarios common in big data environments
114114
"""
115115
# Create a mock table with the necessary attributes
116-
table = Mock(spec=DataScan)
116+
scan = Mock(spec=DataScan)
117117

118118
# Mock the plan_files method to return multiple tasks
119119
tasks = [
120120
DummyTask(500000, residual=AlwaysTrue(), delete_files=[]),
121121
DummyTask(500000, residual=AlwaysTrue(), delete_files=[]),
122122
]
123-
table.plan_files = MagicMock(return_value=tasks)
123+
scan.plan_files = MagicMock(return_value=tasks)
124124

125125
# Import and call the actual count method
126126
from pyiceberg.table import DataScan as ActualDataScan
127-
table.count = ActualDataScan.count.__get__(table, ActualDataScan)
127+
scan.count = ActualDataScan.count.__get__(scan, ActualDataScan)
128128

129-
assert table.count() == 1000000
129+
assert scan.count() == 1000000

0 commit comments

Comments
 (0)