Skip to content

Conversation

@jaimeferj
Copy link
Contributor

Closes #2519

Rationale for this change

Are these changes tested?

yes

Are there any user-facing changes?

@jaimeferj jaimeferj force-pushed the feat/json-serialize-or-expression branch from 1d481ad to 26722cf Compare October 19, 2025 22:18
Comment on lines +314 to +317
def __init__(self, left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression) -> None:
if isinstance(self, Or) and not hasattr(self, "left") and not hasattr(self, "right"):
super().__init__(left=left, right=right)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This weird __init__ is to make sure that if the class has been already initialized in __new__, it is not initialized again.
If this is not made, multiple things can happen:

  • If *rest is given, although _build_balanced_tree build an Or instance correctly, Python would call again Or initializer with the original parameters left and right resulting in Or(left, right), ignoring the created Or instance from _build_balanced_tree
  • If left or right is AlwaysTrue it will get overriden and the instance will be initialized as Or(left, right)
  • If left and right is AlwaysFalse it will get overriden and the instance will be initialized as Or(left, right)

I do not know why exactly but it gotta be something from Pydantic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent several hours debugging this, and it is pretty weird. Pydantic seems not to work well with __new__. I came up with the following:

 class Or(IcebergBaseModel, BooleanExpression):
     """OR operation expression - logical disjunction."""
 
-    model_config = ConfigDict(arbitrary_types_allowed=True)
+    model_config = ConfigDict(arbitrary_types_allowed=True, frozen=False)
 
-    type: TypingLiteral["str"] = Field(default="or", alias="type")
-    left: BooleanExpression
-    right: BooleanExpression
+    type: TypingLiteral["or"] = Field(default="or")
+    left: BooleanExpression = Field()
+    right: BooleanExpression = Field()
 
     def __init__(self, left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression) -> None:
-        if isinstance(self, Or) and not hasattr(self, "left") and not hasattr(self, "right"):
-            super().__init__(left=left, right=right)
+        super().__init__(left=self.left, right=self.right)
 
     def __new__(cls, left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression) -> BooleanExpression:  # type: ignore
         if rest:
@@ -326,20 +325,11 @@ class Or(IcebergBaseModel, BooleanExpression):
             return left
         else:
             obj = super().__new__(cls)
+            obj.__pydantic_fields_set__ = set()
+            obj.left = left
+            obj.right = right
             return obj

I see what direction you were going, but if we leave out the obj.left = left then we don't have the attributes and we have an empty Or object. The test is still failing, but I think that's because the UnaryPredicates are not yet serializable. If you update the tests to use LiteralPredicates, then it should work 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot test it now, but I am pretty sure that I left out obj.left = left because of how Python works with __new__. If you return a class of the same type that happens:

If new() is invoked during object construction and it returns an instance of cls, then the new instance’s init() method will be invoked like init(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to the object constructor.
Ref: https://docs.python.org/3.12/reference/datamodel.html#object.__new__

So that is why I purposely left the object uninitialized, so that when the raw obj is returned it is initialized by __init__.

If the object has already being initialized by _build_balanced_tree I just leave it as is.

Copy link
Contributor Author

@jaimeferj jaimeferj Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing, the tests are passing in my implementation, as far as I know, so you know that Or is being instantiated correctly, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier today I had failing tests, but I was still at an older commit of the branch. Let me check again

@Fokko Fokko self-requested a review October 20, 2025 07:36

model_config = ConfigDict(arbitrary_types_allowed=True)

type: TypingLiteral["str"] = Field(default="or", alias="type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: TypingLiteral["str"] = Field(default="or", alias="type")
type: TypingLiteral["or"] = Field(default="or", alias="type")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


assert (
or_.model_dump_json()
== '{"type":"or","left":"IsNull(term=Reference(name=\'a\'))","right":"IsNaN(term=Reference(name=\'b\'))"}'
Copy link
Contributor

@Fokko Fokko Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is invalid, it should return something like:

Suggested change
== '{"type":"or","left":"IsNull(term=Reference(name=\'a\'))","right":"IsNaN(term=Reference(name=\'b\'))"}'
== '{"type":"or","left":{"type":"is-null","term":"a"},"right":{"type":"is-nan","term":"b"}}'

But the unary predicates are still WIP: #2598

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I build an "easier" test for now? Something like Or("a", "b") and then check if {"type":"or","left":"a","right":"b"}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you replace it with EqualTo("a", 10) then it should produce a valid JSON encoded expression. On top of that, we can also remove these: https://github.com/apache/iceberg-python/pull/2565/files#diff-4d0ae566a5a0bf0ceba68c8119e95c3d2d71d92e70e0dd9b48589d3379c7f1caR331-R341

These will override the serializer when it encounters a non-Pydantic object in left/right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a seperate test would be best :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jaimeferj
Copy link
Contributor Author

  1. Rebased to main for the LiteralPredicate changes to make the test.
  2. Added the new test.
  3. Removed the serializers.
  4. Tests and linting passing

@jaimeferj
Copy link
Contributor Author

jaimeferj commented Oct 22, 2025

CI failing at

- hook id: mypy
- exit code: 1
  tests/utils/test_manifest.py:51: error: Unused "type: ignore" comment  [unused-ignore]
  Found 1 error in 1 file (checked 166 source files)

PS: I see that you solved it in #2654

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @jaimeferj

@Fokko
Copy link
Contributor

Fokko commented Oct 22, 2025

@jaimeferj Also feel free to fix the CI in this branch

@jaimeferj jaimeferj force-pushed the feat/json-serialize-or-expression branch from 417678a to 1b18573 Compare October 22, 2025 09:21
@jaimeferj
Copy link
Contributor Author

Sorry for pushing again, I though that the linting error was fixed by #2654 but it still errors:

❯ make lint
python -m poetry run prek run -a
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
debug statements (python)................................................Passed
check yaml...............................................................Passed
check python ast.........................................................Passed
ruff (legacy alias)......................................................Passed
ruff format..............................................................Passed
mypy.....................................................................Failed
- hook id: mypy
- exit code: 1
  tests/utils/test_manifest.py:51: error: "Callable[[FileIO, str], tuple[ManifestFile, ...]]" has no attribute "cache_clear"  [attr-defined]
  Found 1 error in 1 file (checked 166 source files)
markdownlint.............................................................Passed
pydocstyle...............................................................Passed
flynt....................................................................Passed
codespell................................................................Passed
make: *** [Makefile:87: lint] Error 1

@Fokko
Copy link
Contributor

Fokko commented Oct 22, 2025

@jaimeferj It can be that it is due some dependency or something. Maybe make install will fix it?

@jaimeferj
Copy link
Contributor Author

@jaimeferj It can be that it is due some dependency or something. Maybe make install will fix it?

I just reinstalled the whole thing again and it keeps giving the same error while linting. I'll try to sort it out later and if I cannot, I'll open an issue. Thanks

@Fokko
Copy link
Contributor

Fokko commented Oct 22, 2025

@jaimeferj Maybe it is a caching issue, maybe run make clean prior to make install?

@Fokko Fokko merged commit 9ce7619 into apache:main Oct 22, 2025
8 checks passed
@Fokko
Copy link
Contributor

Fokko commented Oct 22, 2025

Thanks for working on this @jaimeferj 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make or expression JSON serializable

2 participants