-
Notifications
You must be signed in to change notification settings - Fork 260
feat: Support ANSI mode sum expr (int inputs) #2600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support ANSI mode sum expr (int inputs) #2600
Conversation
|
Draft PR to support sum function - WIP |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2600 +/- ##
============================================
+ Coverage 56.12% 59.59% +3.46%
- Complexity 976 1376 +400
============================================
Files 119 167 +48
Lines 11743 15492 +3749
Branches 2251 2567 +316
============================================
+ Hits 6591 9232 +2641
- Misses 4012 4962 +950
- Partials 1140 1298 +158 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
970d693 to
c0715aa
Compare
|
@andygrove , @comphead I believe this should be ready for review (pending CI) |
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
Thats actually interesting why would window fall back, we planning to fallback on windows in #2726 but the test would still preserve windows, so I think we need to check a fallback reason |
6dcf47b to
13b7d68
Compare
This is no longer a failing test after I rebased with main branch |
|
@andygrove @comphead . All the tests passed and the PR is rest for review |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderfender This LGTM. I am going to pull the PR locally and do some testing/benchmarking today. Could you rebease/upmerge?
|
Rebased with main |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @coderfender. LGTM pending CI.
|
@comphead could you also review this PR when you have time? |
|
Fixed clippy errors |
|
Thank you for the approval @andygrove |
| "null_tbl") { | ||
| val res = sql("SELECT try_sum(_1) FROM null_tbl") | ||
| checkSparkAnswerAndOperator(res) | ||
| assert(res.collect() === Array(Row(null))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this assertion after checkSparkAnswerAndOperator ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala
Outdated
Show resolved
Hide resolved
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @coderfender for this great PR, I left couple of nits and personally I'm feeling separate implementation would be easier to navigate and support, WDYT?
|
Thank you for the comments @comphead . I addressed the review comments and pushed a fix. Please take a look whenever you get a chance |
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @coderfender this is an epic PR
|
Thank you @andygrove , @comphead for the guidance and mentorship :) |
Which issue does this PR close?
Closes #531 .
(Partially closes 531) . The ANSI changes to support
AVGwill be tracked in a new PRRationale for this change
DataFusion's default SUM doesn't match Spark's overflow semantics. This implementation ensures we get the exact same behavior as Spark for integer overflow in all 3 eval modes (ANSI , Legacy and Try mode)
What changes are included in this PR?
This PR adds native Rust implementation for SUM aggregation on integer types (byte, short, int, long)
Native changes (Rust):
(Inspired from
SumDecimaland spark's SUM impl)SumIntegeraggregate function that handles SUM for all integer types (in coherence with Spark)( Implemented code in similar fashion of spark leveraging
Option<i64>to represent NULL and numeric values for sum , and an additional parameter calledhas_all_nullswhich is leveraged in Try mode to distinguish if NULL sum is caused by all NULL inputs or the fact that the sum overflowed. (Spark does this withshouldTrackIsEmptyand assigning NULL to running sum which is a long datatype) )Scala side changes :
CometSumto add ANSI support (ANSI and Try)eval_modeinstead offail_on_errorto supportLegacy, ANSI and Tryeval modesjava.lang.Longto avoid Scala's auto-boxing feature which auto casts objects to primitive types there by casting nulls to 0s) handling in both simple and GROUP BY agg .How are these changes tested?