Feat/json serialize or expression #2565

jaimeferj · 2025-10-03T20:57:16Z

Rationale for this change

Are these changes tested?

yes

Are there any user-facing changes?

jaimeferj · 2025-10-19T22:30:50Z

pyiceberg/expressions/__init__.py

+    def __init__(self, left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression) -> None:
+        if isinstance(self, Or) and not hasattr(self, "left") and not hasattr(self, "right"):
+            super().__init__(left=left, right=right)
+


This weird __init__ is to make sure that if the class has been already initialized in __new__, it is not initialized again.
If this is not made, multiple things can happen:

If *rest is given, although _build_balanced_tree build an Or instance correctly, Python would call again Or initializer with the original parameters left and right resulting in Or(left, right), ignoring the created Or instance from _build_balanced_tree

If left or right is AlwaysTrue it will get overriden and the instance will be initialized as Or(left, right)

If left and right is AlwaysFalse it will get overriden and the instance will be initialized as Or(left, right)

I do not know why exactly but it gotta be something from Pydantic

I spent several hours debugging this, and it is pretty weird. Pydantic seems not to work well with __new__. I came up with the following:

class Or(IcebergBaseModel, BooleanExpression): """OR operation expression - logical disjunction.""" - model_config = ConfigDict(arbitrary_types_allowed=True) + model_config = ConfigDict(arbitrary_types_allowed=True, frozen=False) - type: TypingLiteral["str"] = Field(default="or", alias="type") - left: BooleanExpression - right: BooleanExpression + type: TypingLiteral["or"] = Field(default="or") + left: BooleanExpression = Field() + right: BooleanExpression = Field() def __init__(self, left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression) -> None: - if isinstance(self, Or) and not hasattr(self, "left") and not hasattr(self, "right"): - super().__init__(left=left, right=right) + super().__init__(left=self.left, right=self.right) def __new__(cls, left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression) -> BooleanExpression: # type: ignore if rest: @@ -326,20 +325,11 @@ class Or(IcebergBaseModel, BooleanExpression): return left else: obj = super().__new__(cls) + obj.__pydantic_fields_set__ = set() + obj.left = left + obj.right = right return obj

I see what direction you were going, but if we leave out the obj.left = left then we don't have the attributes and we have an empty Or object. The test is still failing, but I think that's because the UnaryPredicates are not yet serializable. If you update the tests to use LiteralPredicates, then it should work 👍

I cannot test it now, but I am pretty sure that I left out obj.left = left because of how Python works with __new__. If you return a class of the same type that happens:

If new() is invoked during object construction and it returns an instance of cls, then the new instance’s init() method will be invoked like init(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to the object constructor.
Ref: https://docs.python.org/3.12/reference/datamodel.html#object.__new__

So that is why I purposely left the object uninitialized, so that when the raw obj is returned it is initialized by __init__.

If the object has already being initialized by _build_balanced_tree I just leave it as is.

Another thing, the tests are passing in my implementation, as far as I know, so you know that Or is being instantiated correctly, right?

Earlier today I had failing tests, but I was still at an older commit of the branch. Let me check again

Fokko · 2025-10-20T14:56:25Z

pyiceberg/expressions/__init__.py


+    model_config = ConfigDict(arbitrary_types_allowed=True)
+
+    type: TypingLiteral["str"] = Field(default="or", alias="type")


Suggested change

type: TypingLiteral["str"] = Field(default="or", alias="type")

type: TypingLiteral["or"] = Field(default="or", alias="type")

Fokko · 2025-10-20T15:01:22Z

tests/expressions/test_expressions.py


+    assert (
+        or_.model_dump_json()
+        == '{"type":"or","left":"IsNull(term=Reference(name=\'a\'))","right":"IsNaN(term=Reference(name=\'b\'))"}'


This test is invalid, it should return something like:

Suggested change

== '{"type":"or","left":"IsNull(term=Reference(name=\'a\'))","right":"IsNaN(term=Reference(name=\'b\'))"}'

== '{"type":"or","left":{"type":"is-null","term":"a"},"right":{"type":"is-nan","term":"b"}}'

But the unary predicates are still WIP: #2598

Should I build an "easier" test for now? Something like Or("a", "b") and then check if {"type":"or","left":"a","right":"b"}?

I think if you replace it with EqualTo("a", 10) then it should produce a valid JSON encoded expression. On top of that, we can also remove these: https://github.com/apache/iceberg-python/pull/2565/files#diff-4d0ae566a5a0bf0ceba68c8119e95c3d2d71d92e70e0dd9b48589d3379c7f1caR331-R341

These will override the serializer when it encounters a non-Pydantic object in left/right.

I think a seperate test would be best :)

jaimeferj · 2025-10-21T22:27:03Z

Rebased to main for the LiteralPredicate changes to make the test.
Added the new test.
Removed the serializers.
Tests and linting passing

jaimeferj · 2025-10-22T06:55:09Z

CI failing at

- hook id: mypy
- exit code: 1
  tests/utils/test_manifest.py:51: error: Unused "type: ignore" comment  [unused-ignore]
  Found 1 error in 1 file (checked 166 source files)

PS: I see that you solved it in #2654

Fokko

LGTM! Thanks @jaimeferj

Fokko · 2025-10-22T07:19:43Z

@jaimeferj Also feel free to fix the CI in this branch

jaimeferj · 2025-10-22T09:25:10Z

Sorry for pushing again, I though that the linting error was fixed by #2654 but it still errors:

❯ make lint
python -m poetry run prek run -a
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
debug statements (python)................................................Passed
check yaml...............................................................Passed
check python ast.........................................................Passed
ruff (legacy alias)......................................................Passed
ruff format..............................................................Passed
mypy.....................................................................Failed
- hook id: mypy
- exit code: 1
  tests/utils/test_manifest.py:51: error: "Callable[[FileIO, str], tuple[ManifestFile, ...]]" has no attribute "cache_clear"  [attr-defined]
  Found 1 error in 1 file (checked 166 source files)
markdownlint.............................................................Passed
pydocstyle...............................................................Passed
flynt....................................................................Passed
codespell................................................................Passed
make: *** [Makefile:87: lint] Error 1

Fokko · 2025-10-22T12:38:05Z

@jaimeferj It can be that it is due some dependency or something. Maybe make install will fix it?

jaimeferj · 2025-10-22T13:44:31Z

@jaimeferj It can be that it is due some dependency or something. Maybe make install will fix it?

I just reinstalled the whole thing again and it keeps giving the same error while linting. I'll try to sort it out later and if I cannot, I'll open an issue. Thanks

Fokko · 2025-10-22T15:44:25Z

@jaimeferj Maybe it is a caching issue, maybe run make clean prior to make install?

Fokko · 2025-10-22T15:44:42Z

Thanks for working on this @jaimeferj 🙌

jaimeferj force-pushed the feat/json-serialize-or-expression branch from 1d481ad to 26722cf Compare October 19, 2025 22:18

jaimeferj commented Oct 19, 2025

View reviewed changes

Fokko self-requested a review October 20, 2025 07:36

Fokko reviewed Oct 20, 2025

View reviewed changes

Fokko mentioned this pull request Oct 20, 2025

Make And expression JSON serializable using Pydantic #2560

Closed

jaimeferj force-pushed the feat/json-serialize-or-expression branch from e58863b to 417678a Compare October 21, 2025 22:25

Fokko approved these changes Oct 22, 2025

View reviewed changes

jaimeferj added 8 commits October 22, 2025 11:21

feat: make Or json serializable via IcebergBaseModel

c961dc3

fix: invalid creation of Or instance with multiple BooleanExpressions

873c022

feat: fix or instance creation for _build_balanced_tree

4f6a09e

revert: test_visitors

2638c8d

fix: typo in typingliteral

4197536

feat: add test for Or serialization alone

dc7a2b4

fix: test Or serialization

2a55dfb

fix: remove unnecessary serializer

1b18573

jaimeferj force-pushed the feat/json-serialize-or-expression branch from 417678a to 1b18573 Compare October 22, 2025 09:21

Fokko merged commit 9ce7619 into apache:main Oct 22, 2025
8 checks passed


		model_config = ConfigDict(arbitrary_types_allowed=True)

		type: TypingLiteral["str"] = Field(default="or", alias="type")

	type: TypingLiteral["str"] = Field(default="or", alias="type")
	type: TypingLiteral["or"] = Field(default="or", alias="type")

	== '{"type":"or","left":"IsNull(term=Reference(name=\'a\'))","right":"IsNaN(term=Reference(name=\'b\'))"}'
	== '{"type":"or","left":{"type":"is-null","term":"a"},"right":{"type":"is-nan","term":"b"}}'

Feat/json serialize or expression #2565

Feat/json serialize or expression #2565

Uh oh!

Conversation

jaimeferj commented Oct 3, 2025

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaimeferj Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fokko Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaimeferj commented Oct 21, 2025

Uh oh!

jaimeferj commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fokko left a comment

Choose a reason for hiding this comment

Uh oh!

Fokko commented Oct 22, 2025

Uh oh!

jaimeferj commented Oct 22, 2025

Uh oh!

Fokko commented Oct 22, 2025

Uh oh!

jaimeferj commented Oct 22, 2025

Uh oh!

Fokko commented Oct 22, 2025

Uh oh!

Uh oh!

Fokko commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jaimeferj Oct 20, 2025 •

edited

Loading

Fokko Oct 20, 2025 •

edited

Loading

jaimeferj commented Oct 22, 2025 •

edited

Loading