Skip to content

Conversation

@steFaiz
Copy link
Contributor

@steFaiz steFaiz commented Feb 11, 2026

Purpose

Currently, spark will convert Between predicate to the composition of lessOrEqual and GreaterOrEqual, this PR is about to recognize this pattern, converting some And CompoundedPredicate to a single Between LeafPredicate.

Linked issue: none

Tests

Please see org.apache.paimon.spark.sql.SparkV2FilterConverterTestBase

API and Format

No changes.

Documentation

No changes.

Generative AI tooling

This PR is fully hand-written.

@steFaiz steFaiz marked this pull request as draft February 11, 2026 13:09
@steFaiz
Copy link
Contributor Author

steFaiz commented Feb 11, 2026

This PR also rewrite data filters in PaimonBatchScanBuilder, trying to merge candidates filters into Between LeafPredicate.
This is because Spark will split Between into two separated LessOfEqual and GreaterOrEqual SparkPredicates (Not a AND predicate with children).

@steFaiz steFaiz marked this pull request as ready for review February 12, 2026 04:26
import java.util
import java.util.Objects

object PredicateUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a PredicateRewrite in core. Maybe invoked in ReadBuilder.withFilter.

Copy link
Contributor Author

@steFaiz steFaiz Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JingsongLi Thanks for your advise! I've refactored it. But it's worthy to note that implementing this logic in core is much more complicated than in spark. We have to consider broader scenarios, for example:

  1. recursive: OR(AND(a >= 1, a <= 10, a is not null), b > 10, ... )
  2. chained AND: AND(a >= 1, AND(a <= 10, b > 10)) (this could be converted to BETWEEN(a, 1, 10))

and more. maybe I've missed some scenario.

Please take a look and i'm pleasant to improve my code and fix potential bugs.

@steFaiz steFaiz force-pushed the spark_between_predicte branch from b7d5af4 to 1a0e215 Compare February 12, 2026 09:09
@steFaiz steFaiz force-pushed the spark_between_predicte branch from 1a0e215 to 992ef72 Compare February 12, 2026 09:14
@steFaiz steFaiz force-pushed the spark_between_predicte branch from 0736498 to 01a9d74 Compare February 12, 2026 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants