Skip to content

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Mar 18, 2025

Rationale for this change

Yikes! This makes sure to only produce a snapshot when there is anything to update or append.

Are these changes tested?

Yes, by checking the snapshots that are being produced.

Are there any user-facing changes?

Smaller metadata and faster commits when there is nothing to append/update :)

Yikes! This makes sure to only produce a snapshot
when there is anything to update or add.
Copy link
Collaborator

@sungwy sungwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines -1204 to +1205
tx.overwrite(rows_to_update, overwrite_filter=overwrite_mask_predicate)
tx.overwrite(rows_to_update, overwrite_filter=overwrite_mask_predicate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we enforce not producing empty snapshots lower in the stack, in the write functions themselves (overwrite/append)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a great suggestion. I was also looking into that, but there are two caveats:

  • Before computing all the manifests, we don't know if they are empty. The delete manifests are lazily computed, so making sure that we short-circuit earlier in the process, makes sure that we don't do all the work.
  • There is also a behavioral aspect around it, for example, when you do tbl.append(df) where df is empty, or tbl.delete('x = 10'), you may still want to produce a snapshot to indicate that something happened. My opinion is that it should result in a no-op, but Java creates a snapshot:

image

I believe @sungwy and I had a similar discussion.

@Fokko Fokko merged commit 6658187 into apache:main Mar 19, 2025
7 checks passed
@Fokko Fokko deleted the fd-bug branch March 19, 2025 11:37
@Fokko Fokko added this to the PyIceberg 0.9.1 milestone Apr 20, 2025
Fokko added a commit that referenced this pull request Apr 25, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

Yikes! This makes sure to only produce a snapshot when there is anything
to update or append.

# Are these changes tested?

Yes, by checking the snapshots that are being produced.

# Are there any user-facing changes?

Smaller metadata and faster commits when there is nothing to
append/update :)

<!-- In the case of user-facing changes, please add the changelog label.
-->
gabeiglio pushed a commit to Netflix/iceberg-python that referenced this pull request Aug 13, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

Yikes! This makes sure to only produce a snapshot when there is anything
to update or append.

# Are these changes tested?

Yes, by checking the snapshots that are being produced.

# Are there any user-facing changes?

Smaller metadata and faster commits when there is nothing to
append/update :)

<!-- In the case of user-facing changes, please add the changelog label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants