-
Notifications
You must be signed in to change notification settings - Fork 412
Add serializer for AssertRefSnapshotId allowing null json value #2375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! slight nit on making a few changes for maintainability :)
pyiceberg/table/update/__init__.py
Outdated
| return { | ||
| "type": self.type, | ||
| "ref": self.ref, | ||
| "snapshot-id": self.snapshot_id, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is there a way to call super() or the default serializer? we just want to explicitly override for snapshot-id.
otherwise, we'd have to remember to change to this function everytime we add a new variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! Added the mode='wrap' param that allows that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! very clean, very nice
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up here @ox 🙌
Closes #2342
Rationale for this change
The last attempt at fixing this (#2343) did not actually end up serializing None values to
null. This was b/c theAssertRefSnapshotIdrequirement was often a child object of some otherRequestobject, which was actually getting themodel_dump_json()call. This meant that theexclude_none=Trueoverride would remove anyNones before any of theField-specific configurations would be considered (likeexclude=Falseor the model's ownmodel_dump_jsonmethod).I then investigated setting some
ConfigDictsettings on the model, but that also did not have any useful configurations.I looked into setting
exclude_none=Trueat all of themodel_json_dump()callsites but there were too many and it would be easy to forget to set that value every time. The issue is that many values are supposed to be absent to instruct the REST Catalog about what the client wants (things like filtering, for instance).The solution I ended up with was allowingsnapshot_idto be a requiredintwith a default value of-1, and adding a json-specific field serializer to write outnullwhen dumping to json. This seems to workThe solution that worked and allowed for
snapshot_idto be set toNone(as it's done in Ray Data), was to define amodel_serializermethod and always return the fields. This was based on conversations in pydantic/pydantic#7326.Are these changes tested?
Yes
Are there any user-facing changes?
Actually allow
snapshot-idto be null forassert-ref-snapshot-idrequirements