Commit bae62df
authored
Use a balanced tree instead of unbalanced one (#1830)
**Use a balanced tree instead of an unbalanced one to prevent recursion
error in `create_match_filter`**
<!-- Closes #1776 -->
## Rationale for this change
In the `create_match_filter` function, the previous implementation used
`functools.reduce(operator.or_, filters)` to combine expressions. This
approach constructed a right-heavy, unbalanced tree, which could lead to
a `RecursionError` when dealing with a large number of expressions
(e.g., over 1,000).
To address this, we've introduced the `_build_balanced_tree` function.
This utility constructs a balanced binary tree of expressions, reducing
the maximum depth to O(log n) and thereby preventing potential recursion
errors. This makes expression construction more stable and scalable,
especially when working with large datasets.
```python
# Past behavior
Or(*[A, B, C, D]) = Or(A, Or(B, Or(C, D))
# New behavior
Or(*[A, B, C, D]) = Or(Or(A, B), Or(C, D))
```
## Are these changes tested?
Yes, existing tests cover the functionality of `Or`. Additional testing
was done with large expression sets (e.g., 10,000 items) to ensure that
balanced tree construction avoids recursion errors.
## Are there any user-facing changes?
No, there are no user-facing changes. This is an internal implementation
improvement that does not affect the public API.
Closes #1759
Closes #1785
<!-- In the case of user-facing changes, please add the changelog label.
-->1 parent 4b15fb6 commit bae62df
File tree
5 files changed
+78
-31
lines changed- pyiceberg
- expressions
- table
- tests
- expressions
- io
5 files changed
+78
-31
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
| 27 | + | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
| |||
79 | 81 | | |
80 | 82 | | |
81 | 83 | | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
82 | 123 | | |
83 | 124 | | |
84 | 125 | | |
| |||
214 | 255 | | |
215 | 256 | | |
216 | 257 | | |
217 | | - | |
| 258 | + | |
218 | 259 | | |
219 | 260 | | |
220 | 261 | | |
| |||
257 | 298 | | |
258 | 299 | | |
259 | 300 | | |
260 | | - | |
| 301 | + | |
261 | 302 | | |
262 | 303 | | |
263 | 304 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| |||
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
42 | | - | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
43 | 49 | | |
44 | 50 | | |
45 | 51 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
591 | 591 | | |
592 | 592 | | |
593 | 593 | | |
594 | | - | |
| 594 | + | |
595 | 595 | | |
596 | 596 | | |
597 | 597 | | |
598 | | - | |
| 598 | + | |
599 | 599 | | |
600 | 600 | | |
601 | 601 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
233 | | - | |
234 | 233 | | |
235 | 234 | | |
| 235 | + | |
236 | 236 | | |
237 | 237 | | |
238 | | - | |
239 | 238 | | |
240 | 239 | | |
| 240 | + | |
241 | 241 | | |
242 | 242 | | |
243 | 243 | | |
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
345 | 342 | | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
346 | 346 | | |
347 | 347 | | |
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
351 | 351 | | |
352 | 352 | | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
358 | 359 | | |
359 | | - | |
360 | 360 | | |
361 | 361 | | |
362 | 362 | | |
| |||
408 | 408 | | |
409 | 409 | | |
410 | 410 | | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
411 | 418 | | |
412 | 419 | | |
413 | 420 | | |
414 | 421 | | |
415 | 422 | | |
416 | 423 | | |
417 | | - | |
| 424 | + | |
418 | 425 | | |
419 | 426 | | |
420 | 427 | | |
421 | 428 | | |
422 | 429 | | |
423 | 430 | | |
424 | | - | |
425 | | - | |
426 | | - | |
427 | | - | |
428 | | - | |
429 | | - | |
430 | | - | |
| 431 | + | |
431 | 432 | | |
432 | | - | |
433 | 433 | | |
434 | 434 | | |
435 | 435 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
836 | 836 | | |
837 | 837 | | |
838 | 838 | | |
839 | | - | |
| 839 | + | |
840 | 840 | | |
0 commit comments