Skip to content

Commit bc2aa00

Browse files
authored
Support arbitrary literal in BETWEEN operator (#2567)
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Want to support calling BETWEEN for any valid column types, not just numeric columns. This extends support for filter expressions like `date_col BETWEEN '2025-01-01' AND '2025-01-02'`. The `test_invalid_between` test was removed in favor of letting the type checks happen at evaluation time (when the literals are attempted to be cast to the corresponding column's type for comparison). ## Are these changes tested? Yes, tested locally applying filters to tables. ## Are there any user-facing changes? Yes, `BETWEEN` operator signature has changed, it nows supports all comparable column types. <!-- In the case of user-facing changes, please add the changelog label. -->
1 parent fd5be9e commit bc2aa00

File tree

3 files changed

+18
-15
lines changed

3 files changed

+18
-15
lines changed

mkdocs/docs/row-filter-syntax.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,11 +102,13 @@ column NOT LIKE 'prefix%'
102102

103103
## BETWEEN
104104

105-
The BETWEEN operator filters a numeric value against an inclusive range, e.g. `a between 1 and 2` is equivalent to `a >= 1 and a <= 2`.
105+
The BETWEEN operator filters a column against an inclusive range of two comparable literals, e.g. `a between 1 and 2` is equivalent to `a >= 1 and a <= 2`.
106106

107107
```sql
108108
column BETWEEN 1 AND 2
109109
column BETWEEN 1.0 AND 2.0
110+
column BETWEEN '2025-01-01' AND '2025-01-02'
111+
column BETWEEN '2025-01-01T00:00:00.000000' AND '2025-01-02T12:00:00.000000'
110112
```
111113

112114
## Logical Operations

pyiceberg/expressions/parser.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,6 @@ def _(result: ParseResults) -> Reference:
107107
string = sgl_quoted_string.set_results_name("raw_quoted_string")
108108
decimal = common.real().set_results_name("decimal")
109109
integer = common.signed_integer().set_results_name("integer")
110-
number = common.number().set_results_name("number")
111110
literal = Group(string | decimal | integer | boolean).set_results_name("literal")
112111
literal_set = Group(
113112
DelimitedList(string) | DelimitedList(decimal) | DelimitedList(integer) | DelimitedList(boolean)
@@ -151,7 +150,7 @@ def _(result: ParseResults) -> Literal[L]:
151150
left_ref = column + comparison_op + literal
152151
right_ref = literal + comparison_op + column
153152
comparison = left_ref | right_ref
154-
between = column + BETWEEN + number + AND + number
153+
between = column + BETWEEN + literal + AND + literal
155154

156155

157156
@between.set_parse_action

tests/expressions/test_parser.py

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
Reference,
4343
StartsWith,
4444
)
45-
from pyiceberg.expressions.literals import DecimalLiteral, LongLiteral
45+
from pyiceberg.expressions.literals import DecimalLiteral, LongLiteral, literal
4646

4747

4848
def test_always_true() -> None:
@@ -241,7 +241,8 @@ def test_quoted_column_with_spaces() -> None:
241241
assert EqualTo("Foo Bar", "data") == parser.parse("\"Foo Bar\" = 'data'")
242242

243243

244-
def test_valid_between() -> None:
244+
def test_valid_between_with_numerics() -> None:
245+
# numerics
245246
assert And(
246247
left=GreaterThanOrEqual(Reference(name="foo"), LongLiteral(1)),
247248
right=LessThanOrEqual(Reference(name="foo"), LongLiteral(3)),
@@ -254,16 +255,17 @@ def test_valid_between() -> None:
254255
left=GreaterThanOrEqual(Reference(name="foo"), DecimalLiteral(Decimal(1.0))),
255256
right=LessThanOrEqual(Reference(name="foo"), DecimalLiteral(Decimal(4.0))),
256257
) == parser.parse("foo between 1.0 and 4.0")
257-
assert parser.parse("foo between 1 and 3") == parser.parse("1 <= foo and foo <= 3")
258258

259+
# dates
260+
assert And(
261+
left=GreaterThanOrEqual(Reference(name="foo"), literal("2025-05-10")),
262+
right=LessThanOrEqual(Reference(name="foo"), literal("2025-05-12")),
263+
) == parser.parse("foo between '2025-05-10' and '2025-05-12'")
259264

260-
def test_invalid_between() -> None:
261-
# boolean
262-
with pytest.raises(ParseException) as exc_info:
263-
parser.parse("foo between true and false")
264-
assert "Expected number, found 'true'" in str(exc_info)
265+
# timestamps
266+
assert And(
267+
left=GreaterThanOrEqual(Reference(name="foo"), literal("2025-01-01T00:00:00.000000")),
268+
right=LessThanOrEqual(Reference(name="foo"), literal("2025-01-10T12:00:00.000000")),
269+
) == parser.parse("foo between '2025-01-01T00:00:00.000000' and '2025-01-10T12:00:00.000000'")
265270

266-
# string
267-
with pytest.raises(ParseException) as exc_info:
268-
parser.parse("foo between 'a' and 'b'")
269-
assert 'Expected number, found "\'"' in str(exc_info)
271+
assert parser.parse("foo between 1 and 3") == parser.parse("1 <= foo and foo <= 3")

0 commit comments

Comments
 (0)