Commit 562de40
committed
Fix
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
Found out I broke this myself after doing a `git bisect`:
```
36d383d is the first bad commit
commit 36d383d
Author: Fokko Driesprong <fokko@apache.org>
Date: Thu Jan 23 07:50:54 2025 +0100
PyArrow: Avoid buffer-overflow by avoid doing a sort (#1555)
Second attempt of #1539
This was already being discussed back here:
#208 (comment)
This PR changes from doing a sort, and then a single pass over the table
to the approach where we determine the unique partition tuples filter on
them individually.
Fixes #1491
Because the sort caused buffers to be joined where it would overflow in
Arrow. I think this is an issue on the Arrow side, and it should
automatically break up into smaller buffers. The `combine_chunks` method
does this correctly.
Now:
```
0.42877754200890195
Run 1 took: 0.2507691659993725
Run 2 took: 0.24833179199777078
Run 3 took: 0.24401691700040828
Run 4 took: 0.2419595829996979
Average runtime of 0.28 seconds
```
Before:
```
Run 0 took: 1.0768639159941813
Run 1 took: 0.8784021250030492
Run 2 took: 0.8486490420036716
Run 3 took: 0.8614017910003895
Run 4 took: 0.8497851670108503
Average runtime of 0.9 seconds
```
So it comes with a nice speedup as well :)
---------
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
pyiceberg/io/pyarrow.py | 129 ++-
pyiceberg/partitioning.py | 39 +-
pyiceberg/table/__init__.py | 6 +-
pyproject.toml | 1 +
tests/benchmark/test_benchmark.py | 72 ++
tests/integration/test_partitioning_key.py | 1299 ++++++++++++++--------------
tests/table/test_locations.py | 2 +-
7 files changed, 805 insertions(+), 743 deletions(-)
create mode 100644 tests/benchmark/test_benchmark.py
```
Closes #1917
<!-- In the case of user-facing changes, please add the changelog label.
-->add_files with non-identity transforms (#1925)1 parent ddd7225 commit 562de40
2 files changed
+50
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2205 | 2205 | | |
2206 | 2206 | | |
2207 | 2207 | | |
2208 | | - | |
| 2208 | + | |
| 2209 | + | |
| 2210 | + | |
| 2211 | + | |
2209 | 2212 | | |
2210 | 2213 | | |
2211 | 2214 | | |
2212 | 2215 | | |
2213 | | - | |
2214 | | - | |
2215 | | - | |
2216 | | - | |
| 2216 | + | |
| 2217 | + | |
| 2218 | + | |
| 2219 | + | |
| 2220 | + | |
| 2221 | + | |
| 2222 | + | |
| 2223 | + | |
2217 | 2224 | | |
2218 | | - | |
2219 | | - | |
2220 | | - | |
2221 | | - | |
| 2225 | + | |
| 2226 | + | |
| 2227 | + | |
| 2228 | + | |
| 2229 | + | |
| 2230 | + | |
2222 | 2231 | | |
2223 | 2232 | | |
2224 | 2233 | | |
2225 | 2234 | | |
2226 | 2235 | | |
2227 | 2236 | | |
2228 | | - | |
2229 | | - | |
2230 | | - | |
| 2237 | + | |
2231 | 2238 | | |
2232 | 2239 | | |
2233 | 2240 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | | - | |
| 38 | + | |
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| 46 | + | |
45 | 47 | | |
46 | 48 | | |
47 | 49 | | |
| |||
850 | 852 | | |
851 | 853 | | |
852 | 854 | | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
0 commit comments