-
Notifications
You must be signed in to change notification settings - Fork 1.9k
WIP: Upgrade DataFusion to arrow-rs/parquet 57.2.0 #19355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7ec33db to
24d3a15
Compare
| # Upstream arrow-rs issue: https://github.com/apache/arrow-rs/issues/8841 | ||
| # This should succeed after we receive the fix | ||
| query error Arrow error: Compute error: Internal Error: Cannot cast BinaryView to BinaryArray of expected type | ||
| query I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jefffrey fixed them upstream ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this fix will also close #19290; quickly tested checking this PR branch out and trying the test in the issue and it seems to execute successfully
| # coerce structs with different field orders, | ||
| # (note the *value*s are from column2 but the field name is 'xxx', as the coerced | ||
| # type takes the field name from the last argument (column3) | ||
| # should keep the same field values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests now work thanks to @brancz in
| drop table t; | ||
|
|
||
| query error DataFusion error: Optimizer rule 'simplify_expressions' failed[\s\S]*Arrow error: Cast error: Cannot cast string 'a' to value of Float64 type | ||
| statement ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests now work thanks to @brancz in
|
run benchmarks |
|
run benchmarks |
This comment was marked as outdated.
This comment was marked as outdated.
|
🤖: Benchmark completed Details
|
|
run benchmarks |
This comment was marked as outdated.
This comment was marked as outdated.
|
🤖: Benchmark completed Details
|
24d3a15 to
ff98f2f
Compare
|
run benchmarks |
ff98f2f to
4155056
Compare
4155056 to
70e2841
Compare
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark clickbench_partitioned |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark clickbench_partitioned |
|
🤖 |
|
🤖: Benchmark completed Details
|
# Which issue does this PR close? - Part of #8465 # Rationale for this change - See #8465 # What changes are included in this PR? 1. Update version to 57.2.0 2. Add CHANGELOG. See rendered version https://github.com/alamb/arrow-rs/blob/alamb/prepare_57.2.0/CHANGELOG.md # Are these changes tested? By CI and I am testing them manually in DataFusion here: - apache/datafusion#19355 # Are there any user-facing changes? Version and changelog
|
I do see a reproducable speedup on q28 which I think is real Query 28 is:
SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;It is not clear to me what is responsible |
Which issue does this PR close?
57.2.0(December 2025) arrow-rs#8465Rationale for this change
Upgrade to latest arrow version
I am making this PR early to test the arrow release with DataFusion
What changes are included in this PR?
Are these changes tested?
Yes by CI
Are there any user-facing changes?
No