-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Preserve ORDER BY in Unparser for projection -> order by pattern #19483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@alamb @goldmedal @y-f-u could you folks take a look at this since you originally added this bit of code in #11527? As far as I can tell this has kept all of those tests passing and only produced some formatting changes in one test's SQL, but I'm not familiar with the Unparser code in general so this needs some critical thought. |
datafusion/core/tests/sql/orderby.rs
Outdated
| SELECT | ||
| col * 2 as x_bucket, | ||
| count(*) | ||
| FROM t1 | ||
| GROUP BY x_bucket | ||
| ORDER BY x_bucket, count(*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably move this to a test in plan_to_sql.rs but I struggled a bit translating it since there's limited functions available (e.g. count(*)). I do also think e2e tests with data are useful in that they don't require a specific SQL representation as long as query semantics are maintained. But I will try to port again once we get some initial feedback here.
7c40d24 to
3019e75
Compare
| assert_snapshot!( | ||
| sql, | ||
| @"SELECT j1.j1_id, j1.j1_string, lochierarchy FROM (SELECT j1.j1_id, j1.j1_string, (grouping(j1.j1_id) + grouping(j1.j1_string)) AS lochierarchy, grouping(j1.j1_string), grouping(j1.j1_id) FROM j1 GROUP BY ROLLUP (j1.j1_id, j1.j1_string) ORDER BY lochierarchy DESC NULLS FIRST, CASE WHEN ((grouping(j1.j1_id) + grouping(j1.j1_string)) = 0) THEN j1.j1_id END ASC NULLS LAST) LIMIT 100" | ||
| @r#"SELECT j1.j1_id, j1.j1_string, lochierarchy FROM (SELECT j1.j1_id, j1.j1_string, (grouping(j1.j1_id) + grouping(j1.j1_string)) AS lochierarchy, grouping(j1.j1_string), grouping(j1.j1_id) FROM j1 GROUP BY ROLLUP (j1.j1_id, j1.j1_string)) ORDER BY lochierarchy DESC NULLS FIRST, CASE WHEN (("grouping(j1.j1_id)" + "grouping(j1.j1_string)") = 0) THEN j1.j1_id END ASC NULLS LAST LIMIT 100"# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatted difference:
- @"SELECT
- j1.j1_id,
- j1.j1_string,
- lochierarchy
- FROM (
- SELECT
- j1.j1_id,
- j1.j1_string,
- (grouping(j1.j1_id) + grouping(j1.j1_string)) AS lochierarchy,
- grouping(j1.j1_string),
- grouping(j1.j1_id)
- FROM j1
- GROUP BY ROLLUP (j1.j1_id, j1.j1_string)
- ORDER BY
- lochierarchy DESC NULLS FIRST,
- CASE
- WHEN ((grouping(j1.j1_id) + grouping(j1.j1_string)) = 0) THEN j1.j1_id
- END ASC NULLS LAST
- )
- LIMIT 100"
+ @r#"SELECT
+ j1.j1_id,
+ j1.j1_string,
+ lochierarchy
+ FROM (
+ SELECT
+ j1.j1_id,
+ j1.j1_string,
+ (grouping(j1.j1_id) + grouping(j1.j1_string)) AS lochierarchy,
+ grouping(j1.j1_string),
+ grouping(j1.j1_id)
+ FROM j1
+ GROUP BY ROLLUP (j1.j1_id, j1.j1_string)
+ )
+ ORDER BY
+ lochierarchy DESC NULLS FIRST,
+ CASE
+ WHEN (("grouping(j1.j1_id)" + "grouping(j1.j1_string)") = 0) THEN j1.j1_id
+ END ASC NULLS LAST
+ LIMIT 100"#As you can see the ORDER BY got moved outside of the subquery, which is what we want.
|
I've added a property based test that asserts the property that results should be the same after unparsing and re-parsing a query given the same input data*. I think this is a good test because:
*: Not all queries have a deterministic sort order. I check if the original query has a known output ordering and if it doesn't I sort both outputs. These tests show that without these fixes there are two issues for ClickBench queries:
Here is the failure output (also relevant to judge since the tests are being added): |
Because of #15886 a parse -> unparse -> parse loop changed the query so that it would give incorrect results.