-
Notifications
You must be signed in to change notification settings - Fork 413
Allow snapshot-id in assert-ref-snapshot-id requirement to serialize to null in json #2343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the PR! This makes sense.
pyiceberg/table/update/__init__.py
Outdated
| # override the override method, allowing None to serialize to `null` instead of being omitted. | ||
| def model_dump_json( | ||
| self, exclude_none: bool = False, exclude: Optional[Set[str]] = None, by_alias: bool = True, **kwargs: Any | ||
| ) -> str: | ||
| return super().model_dump_json( | ||
| exclude_none=exclude_none, exclude=self._exclude_private_properties(exclude), by_alias=by_alias, **kwargs | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i think we can simply this to
def model_dump_json(self) -> str:
# `snapshot-id` is required in json response, even if null
return super().model_dump_json(exclude_none=False)
|
looks like linter errored too, |
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the spec off here?
AssertRefSnapshotId:
description:
The table branch or tag identified by the requirement's `ref` must reference the requirement's `snapshot-id`;
if `snapshot-id` is `null` or missing, the ref must not already exist
allOf:
- $ref: '#/components/schemas/TableRequirement'
required:
- ref
- snapshot-id
properties:
type:
type: string
const: "assert-ref-snapshot-id"
ref:
type: string
snapshot-id:
type: integer
format: int64When I read the description, the snapshot-id might be null or missing.
|
yea i read the same, but its marked as required so it can be null/missing but needs to exist in |
|
Ah, like that. Yes, I agree. Still the description of the spec could use an update, since it cannot be missing 🤣 |
|
@Fokko @kevinjqliu thanks for the fast review and merge! |
Closes #2342 # Rationale for this change The last attempt at fixing this (#2343) did not actually end up serializing None values to `null`. This was b/c the `AssertRefSnapshotId` requirement was often a child object of some other `Request` object, which was actually getting the `model_dump_json()` call. This meant that the `exclude_none=True` override would remove any `None`s _before_ any of the `Field`-specific configurations would be considered (like `exclude=False` or the model's own `model_dump_json` method). I then investigated setting some `ConfigDict` settings on the model, but that also did not have any useful configurations. I looked into setting `exclude_none=True` at all of the `model_json_dump()` callsites but there were too many and it would be easy to forget to set that value every time. The issue is that many values are _supposed to be absent_ to instruct the REST Catalog about what the client wants (things like filtering, for instance). ~The solution I ended up with was allowing `snapshot_id` to be a required `int` with a default value of `-1`, and adding a json-specific field serializer to write out `null` when dumping to json. This seems to work~ The solution that worked and allowed for `snapshot_id` to be set to `None` (as it's done in Ray Data), was to define a `model_serializer` method and always return the fields. This was based on conversations in pydantic/pydantic#7326. ## Are these changes tested? Yes ## Are there any user-facing changes? Actually allow `snapshot-id` to be null for `assert-ref-snapshot-id` requirements --------- Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
Closes #2342
Rationale for this change
The OpenAPI spec specifies that
snapshot-idis required 1, even if it'snull. The current code omits the field b/cexclude_none=Trueis set for all models that extendIcebergBaseModel. Settingexclude=Falseon the field doesn't override the behavior and thus the model needs to control it's own serialization. This is the only model I know of so far that has this "required null" problem so I held back from making another class for this model to extend.Are these changes tested?
Yes
Are there any user-facing changes?
Set
snapshot-id: nullwhen committing tables that don't have a current snapshot-id.