Skip to content

Conversation

@ox
Copy link
Contributor

@ox ox commented Aug 18, 2025

Closes #2342

Rationale for this change

The OpenAPI spec specifies that snapshot-id is required 1, even if it's null. The current code omits the field b/c exclude_none=True is set for all models that extend IcebergBaseModel. Setting exclude=False on the field doesn't override the behavior and thus the model needs to control it's own serialization. This is the only model I know of so far that has this "required null" problem so I held back from making another class for this model to extend.

Are these changes tested?

Yes

Are there any user-facing changes?

Set snapshot-id: null when committing tables that don't have a current snapshot-id.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the PR! This makes sense.

Comment on lines 759 to 765
# override the override method, allowing None to serialize to `null` instead of being omitted.
def model_dump_json(
self, exclude_none: bool = False, exclude: Optional[Set[str]] = None, by_alias: bool = True, **kwargs: Any
) -> str:
return super().model_dump_json(
exclude_none=exclude_none, exclude=self._exclude_private_properties(exclude), by_alias=by_alias, **kwargs
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i think we can simply this to

    def model_dump_json(self) -> str:
        # `snapshot-id` is required in json response, even if null
        return super().model_dump_json(exclude_none=False)

@kevinjqliu
Copy link
Contributor

looks like linter errored too,
could you run make lint locally?

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the spec off here?

    AssertRefSnapshotId:
      description:
        The table branch or tag identified by the requirement's `ref` must reference the requirement's `snapshot-id`; 
        if `snapshot-id` is `null` or missing, the ref must not already exist
      allOf:
        - $ref: '#/components/schemas/TableRequirement'
      required:
        - ref
        - snapshot-id
      properties:
        type:
          type: string
          const: "assert-ref-snapshot-id"
        ref:
          type: string
        snapshot-id:
          type: integer
          format: int64

When I read the description, the snapshot-id might be null or missing.

@kevinjqliu
Copy link
Contributor

yea i read the same, but its marked as required

      required:
        - ref
        - snapshot-id

so it can be null/missing but needs to exist in AssertRefSnapshotId

@Fokko
Copy link
Contributor

Fokko commented Aug 19, 2025

Ah, like that. Yes, I agree. Still the description of the spec could use an update, since it cannot be missing 🤣

Fokko and others added 3 commits August 19, 2025 14:31
Co-authored-by: Kevin Liu <kevinjqliu@users.noreply.github.com>
@kevinjqliu kevinjqliu merged commit fa9094b into apache:main Aug 19, 2025
10 checks passed
@kevinjqliu
Copy link
Contributor

Thanks for the PR @ox and thanks @Fokko for the review :)

@ox
Copy link
Contributor Author

ox commented Aug 19, 2025

@Fokko @kevinjqliu thanks for the fast review and merge!

kevinjqliu added a commit that referenced this pull request Aug 25, 2025
Closes #2342

# Rationale for this change

The last attempt at fixing this (#2343) did not actually end up
serializing None values to `null`. This was b/c the
`AssertRefSnapshotId` requirement was often a child object of some other
`Request` object, which was actually getting the `model_dump_json()`
call. This meant that the `exclude_none=True` override would remove any
`None`s _before_ any of the `Field`-specific configurations would be
considered (like `exclude=False` or the model's own `model_dump_json`
method).

I then investigated setting some `ConfigDict` settings on the model, but
that also did not have any useful configurations.

I looked into setting `exclude_none=True` at all of the
`model_json_dump()` callsites but there were too many and it would be
easy to forget to set that value every time. The issue is that many
values are _supposed to be absent_ to instruct the REST Catalog about
what the client wants (things like filtering, for instance).

~The solution I ended up with was allowing `snapshot_id` to be a
required `int` with a default value of `-1`, and adding a json-specific
field serializer to write out `null` when dumping to json. This seems to
work~

The solution that worked and allowed for `snapshot_id` to be set to
`None` (as it's done in Ray Data), was to define a `model_serializer`
method and always return the fields. This was based on conversations in
pydantic/pydantic#7326.

## Are these changes tested?

Yes

## Are there any user-facing changes?

Actually allow `snapshot-id` to be null for `assert-ref-snapshot-id`
requirements

---------

Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

missing snapshot-id for assert-ref-snapshot-id requirement marshals to invalid JSON

3 participants