Add implementation of SQL Except operation by demianw · Pull Request #135 · dask-contrib/dask-sql

demianw · 2021-02-09T08:10:16Z

There is also an added log line to show non-optimised query plans

nils-braun

Thank you so much @demianw! That is a very nice addon!
If you want, you can also add it into the documentation in select.rst (e.g. below the UNION).

I have a few comments on the changes, but in general I am already quite fine with it.

nils-braun · 2021-02-10T21:14:53Z

dask_sql/context.py

                rel_string = str(generator.getRelationalAlgebraString(rel))
+                logger.debug(
+                    f"Non optimised query plan: \n "
+                    f"{str(generator.getRelationalAlgebraString(nonOptimizedRelNode))}"


Ok, makes sense to do that. Good idea.

nils-braun · 2021-02-10T21:15:41Z

dask_sql/physical/rel/logical/minus.py

+
+class LogicalMinusPlugin(BaseRelPlugin):
+    """
+    LogicalUnion is used on EXCEPT clauses.


I guess you still would like to update that docstring :-)

nils-braun · 2021-02-10T21:16:17Z

dask_sql/physical/rel/logical/minus.py

+        second_df = second_dc.df
+        second_cc = second_dc.column_container
+
+        # For concatenating, they should have exactly the same fields


Suggested change

# For concatenating, they should have exactly the same fields

# For subtracting, they should have exactly the same fields

nils-braun · 2021-02-10T21:17:48Z

dask_sql/physical/rel/logical/minus.py

+        second_df = second_dc.assign()
+
+        self.check_columns_from_row_type(first_df, rel.getExpectedInputRowType(0))
+        self.check_columns_from_row_type(second_df, rel.getExpectedInputRowType(1))


There is now a lot of code duplication in this and the LogicalUnion plugin. I think it would make sense to extract the basic functionalities (the column name cleaning) into a function in utils.py and then reuse it here - or what do you think @demianw?

nils-braun · 2021-02-10T21:19:01Z

dask_sql/physical/rel/logical/minus.py

+            indicator=True,
+        )
+
+        df = df[df.iloc[:, -1] == "left_only"].iloc[:, :-1]


That is super cool!

nils-braun · 2021-02-10T21:21:44Z

tests/integration/test_except.py

+    )
+    result_df = result_df.compute()
+    assert result_df.columns == "a"
+    assert set(result_df["a"]) == set([1, 3])


Would you mind adding a test with NaNs? You can also use one of the prepared tables (e.g. user_table_nan) if it makes sense.
It might also make sense to test this functionality against sqlite - I am just scared that especially on NULL (NaN) pandas/dask and SQL have different opinions (as I have seen so often, unfortunately...)

nils-braun · 2021-02-10T21:21:56Z

tests/integration/test_except.py

@@ -0,0 +1,29 @@
+def test_except_empty(c, df):


I think you do not need the df parameter here

nils-braun · 2021-02-10T21:25:02Z

Oh, and if you want to get rid of the style errors: make sure you use black version 19.10 (I have seen that that might make some difference).

nils-braun · 2021-05-01T12:52:50Z

@demianw - would you still like to work on the PR? I think your changes are absolutely worth being included and just need some small amount of tweaking!

demianw added 2 commits February 8, 2021 23:09

Add implementation of SQL Except operation

8798f82

Blackened

398424c

nils-braun reviewed Feb 10, 2021

View reviewed changes

nils-braun mentioned this pull request May 4, 2021

Changes towards an engine that could help in Neurolang NeuroLang/dask-sql#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add implementation of SQL Except operation#135

Add implementation of SQL Except operation#135
demianw wants to merge 2 commits intodask-contrib:mainfrom
NeuroLang:implement_except

demianw commented Feb 9, 2021 •

edited

Loading

Uh oh!

nils-braun left a comment

Uh oh!

nils-braun Feb 10, 2021

Uh oh!

nils-braun Feb 10, 2021

Uh oh!

nils-braun Feb 10, 2021

Uh oh!

nils-braun Feb 10, 2021

Uh oh!

nils-braun Feb 10, 2021

Uh oh!

nils-braun Feb 10, 2021

Uh oh!

nils-braun Feb 10, 2021

Uh oh!

nils-braun commented Feb 10, 2021

Uh oh!

nils-braun commented May 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# For concatenating, they should have exactly the same fields
	# For subtracting, they should have exactly the same fields

Conversation

demianw commented Feb 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nils-braun left a comment

Choose a reason for hiding this comment

Uh oh!

nils-braun Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

nils-braun Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

nils-braun Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

nils-braun Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

nils-braun Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

nils-braun Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

nils-braun Feb 10, 2021

Choose a reason for hiding this comment

Uh oh!

nils-braun commented Feb 10, 2021

Uh oh!

nils-braun commented May 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

demianw commented Feb 9, 2021 •

edited

Loading