Skip to content

Comments

[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification#54112

Closed
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:not
Closed

[SPARK-54881][SQL][FOLLOWUP] Extract simplifyNot method in BooleanSimplification#54112
cloud-fan wants to merge 1 commit intoapache:masterfrom
cloud-fan:not

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup to #53658. Instead of recursively calling the entire actualExprTransformer to optimize new Not expressions created by De Morgan's Laws, this PR extracts a simplifyNot method that only handles Not simplification and calls itself recursively.

Why are the changes needed?

This is more efficient because:

  • We only apply the Not-specific simplifications (not the And/Or factor elimination logic)
  • We avoid the overhead of tree pattern matching and pruning checks
  • The recursion is more direct and focused

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

Yes.

Made with Cursor

…plification

This is a followup to apache#53658. Instead of recursively calling the entire `actualExprTransformer` to optimize new Not expressions created by De Morgan's Laws, this PR extracts a `simplifyNot` method that only handles Not simplification and calls itself recursively.

This is more efficient because:
- We only apply the Not-specific simplifications (not the And/Or factor elimination logic)
- We avoid the overhead of tree pattern matching and pruning checks
- The recursion is more direct and focused

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

JIRA Issue Information

=== Improvement SPARK-54881 ===
Summary: BooleanSimplification rule using transformExpressionsUp instead of transformExpressionsDown, is inefficient in some cases resulting in delayed idempotency
Assignee: Asif
Status: Resolved
Affected: ["4.0.2","3.5.8","4.1.2"]


This comment was automatically generated by GitHub Actions

@github-actions github-actions bot added the SQL label Feb 3, 2026
@cloud-fan
Copy link
Contributor Author

cc @ahshahid @peter-toth

case _ => not
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have 2 comments:

  1. Have found recursive calls to be less efficient, as compared to getting the recursive behaviour via call on an instance, where the instance itself is changing at every step.. like it is happening in transformDown
  2. Any future modification , say adding new case etc, would require analysis whether it should go in both the places ( original transform down, and new simplifyNot function, or only in once place... atleast this is what I feel.. though I may be wrong.
    But if that identification would be straightforward and any hypothetical new case added would be needed only in one of the functions, then this distinction is good. Though recursive call on method will be a concern for complex expressions....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it was O(n^2): actualExprTransformer is called in transformExpressionsUpWithPruning, and itself also calls transformDownWithPruning. We repeatedly transform the sub-expression tree.

I don't think recursion is a concern, as these transform APIs are all recursive under the hood.

@ahshahid
Copy link
Contributor

ahshahid commented Feb 3, 2026 via email

@cloud-fan
Copy link
Contributor Author

@ahshahid checking Not pattern also traverse the tree.

@ahshahid
Copy link
Contributor

ahshahid commented Feb 3, 2026 via email

Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar what I suggested to amend the original PR: ahshahid#1, especially at this point: ahshahid@ab29d7b.

I think due to @ahshahid provided ruleId to the Not handling transformDownWithPruning(), unlikely it is ever O(n^2) in practice.
But I like the clean approach of this PR.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM for this simplification.

@ahshahid
Copy link
Contributor

ahshahid commented Feb 3, 2026

I am also fine with it..
Though for my learning perspective I would be interested in knowing your thoughts on cost of:
a) Same function of an instance being recursed
vs
2) Calling same function of different instance every time in recursion...
I know that for the case of a) there is a possibility of OOM or StackOverflow error for large depth... Would the same situation occur at same number of calls for #2?

@cloud-fan
Copy link
Contributor Author

@ahshahid I don't think Scala PartialFunction is magical and better than normal method. There is no function instance in my understanding, methods are always bound to an instance in JVM (only lambda is different I think).

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

@cloud-fan cloud-fan closed this in 56b4a2d Feb 8, 2026
Yicong-Huang pushed a commit to Yicong-Huang/spark that referenced this pull request Feb 8, 2026
…plification

### What changes were proposed in this pull request?

This is a followup to apache#53658. Instead of recursively calling the entire `actualExprTransformer` to optimize new Not expressions created by De Morgan's Laws, this PR extracts a `simplifyNot` method that only handles Not simplification and calls itself recursively.

### Why are the changes needed?

This is more efficient because:
- We only apply the Not-specific simplifications (not the And/Or factor elimination logic)
- We avoid the overhead of tree pattern matching and pruning checks
- The recursion is more direct and focused

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?

Yes.

Made with [Cursor](https://cursor.com)

Closes apache#54112 from cloud-fan/not.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
rpnkv pushed a commit to rpnkv/spark that referenced this pull request Feb 18, 2026
…plification

### What changes were proposed in this pull request?

This is a followup to apache#53658. Instead of recursively calling the entire `actualExprTransformer` to optimize new Not expressions created by De Morgan's Laws, this PR extracts a `simplifyNot` method that only handles Not simplification and calls itself recursively.

### Why are the changes needed?

This is more efficient because:
- We only apply the Not-specific simplifications (not the And/Or factor elimination logic)
- We avoid the overhead of tree pattern matching and pruning checks
- The recursion is more direct and focused

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?

Yes.

Made with [Cursor](https://cursor.com)

Closes apache#54112 from cloud-fan/not.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants