[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification#54112
[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification#54112cloud-fan wants to merge 1 commit intoapache:masterfrom
Conversation
…plification This is a followup to apache#53658. Instead of recursively calling the entire `actualExprTransformer` to optimize new Not expressions created by De Morgan's Laws, this PR extracts a `simplifyNot` method that only handles Not simplification and calls itself recursively. This is more efficient because: - We only apply the Not-specific simplifications (not the And/Or factor elimination logic) - We avoid the overhead of tree pattern matching and pruning checks - The recursion is more direct and focused Co-authored-by: Cursor <cursoragent@cursor.com>
JIRA Issue Information=== Improvement SPARK-54881 === This comment was automatically generated by GitHub Actions |
| case _ => not | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Have 2 comments:
- Have found recursive calls to be less efficient, as compared to getting the recursive behaviour via call on an instance, where the instance itself is changing at every step.. like it is happening in transformDown
- Any future modification , say adding new case etc, would require analysis whether it should go in both the places ( original transform down, and new simplifyNot function, or only in once place... atleast this is what I feel.. though I may be wrong.
But if that identification would be straightforward and any hypothetical new case added would be needed only in one of the functions, then this distinction is good. Though recursive call on method will be a concern for complex expressions....
There was a problem hiding this comment.
Previously it was O(n^2): actualExprTransformer is called in transformExpressionsUpWithPruning, and itself also calls transformDownWithPruning. We repeatedly transform the sub-expression tree.
I don't think recursion is a concern, as these transform APIs are all recursive under the hood.
|
I am not sure there will be any change in terms of number of nodes being
traversed , in new and current code.. my understanding is that the moment
while traversing down, the Not operation is subsumed, the traversal will
stop as the transform down is with pruning.. so if Not pattern is absent
from the subtree it will stop, the way recursion will stop . Isn't it?
…On Tue, Feb 3, 2026, 2:16 AM Wenchen Fan ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
<#54112 (comment)>:
>
case Not(Not(e)) => e
case Not(IsNull(e)) => IsNotNull(e)
case Not(IsNotNull(e)) => IsNull(e)
+
+ case _ => not
}
}
Previously it was O(n^2): actualExprTransformer is called in
transformExpressionsUpWithPruning, and itself also calls
transformDownWithPruning. We repeatedly transform the sub-expression tree.
I don't think recursion is a concern, as these transform APIs are all
recursive under the hood.
—
Reply to this email directly, view it on GitHub
<#54112 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC6XG2BQ654FG23MLRMINIL4KBYQVAVCNFSM6AAAAACTZIEGL6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTONBUGM3TANBXGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@ahshahid checking Not pattern also traverse the tree. |
|
That is right, but if the Not is subsumed, the children would be unchanged,
and each child would have the flag mask already set, as they have undergone
transform up with pruning so the lazy flag buys must have set...this was
my understanding...
…On Tue, Feb 3, 2026, 3:05 AM Wenchen Fan ***@***.***> wrote:
*cloud-fan* left a comment (apache/spark#54112)
<#54112 (comment)>
@ahshahid <https://github.com/ahshahid> checking Not pattern also
traverse the tree.
—
Reply to this email directly, view it on GitHub
<#54112 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC6XG2BZ2SZI3BGQAW27P334KB6F7AVCNFSM6AAAAACTZIEGL6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNBQGY2DMNZYGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
This is very similar what I suggested to amend the original PR: ahshahid#1, especially at this point: ahshahid@ab29d7b.
I think due to @ahshahid provided ruleId to the Not handling transformDownWithPruning(), unlikely it is ever O(n^2) in practice.
But I like the clean approach of this PR.
|
I am also fine with it.. |
|
@ahshahid I don't think Scala |
|
thanks for the review, merging to master! |
…plification ### What changes were proposed in this pull request? This is a followup to apache#53658. Instead of recursively calling the entire `actualExprTransformer` to optimize new Not expressions created by De Morgan's Laws, this PR extracts a `simplifyNot` method that only handles Not simplification and calls itself recursively. ### Why are the changes needed? This is more efficient because: - We only apply the Not-specific simplifications (not the And/Or factor elimination logic) - We avoid the overhead of tree pattern matching and pruning checks - The recursion is more direct and focused ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? Yes. Made with [Cursor](https://cursor.com) Closes apache#54112 from cloud-fan/not. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…plification ### What changes were proposed in this pull request? This is a followup to apache#53658. Instead of recursively calling the entire `actualExprTransformer` to optimize new Not expressions created by De Morgan's Laws, this PR extracts a `simplifyNot` method that only handles Not simplification and calls itself recursively. ### Why are the changes needed? This is more efficient because: - We only apply the Not-specific simplifications (not the And/Or factor elimination logic) - We avoid the overhead of tree pattern matching and pruning checks - The recursion is more direct and focused ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? Yes. Made with [Cursor](https://cursor.com) Closes apache#54112 from cloud-fan/not. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This is a followup to #53658. Instead of recursively calling the entire
actualExprTransformerto optimize new Not expressions created by De Morgan's Laws, this PR extracts asimplifyNotmethod that only handles Not simplification and calls itself recursively.Why are the changes needed?
This is more efficient because:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
Yes.
Made with Cursor