Skip to content

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Aug 28, 2025

Rationale for this change

It looks like we downcast Arrow nanosecond types always to microseconds.

cc @sungwy @kevinjqliu

Are these changes tested?

Are there any user-facing changes?


@pytest.mark.parametrize("format_version", [1, 2, 3])
def test_task_to_record_batches_nanos(format_version: TableVersion, tmpdir: str) -> None:
from datetime import datetime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this import seems unnecessary now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one, I expected the linter to clean that up 🚀

Copy link
Contributor

@rambleraptor rambleraptor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small question, but looks good to me otherwise.

self._bound_row_filter = bind(table_metadata.schema(), row_filter, case_sensitive=case_sensitive)
self._case_sensitive = case_sensitive
self._limit = limit
self._downcast_ns_timestamp_to_us = Config().get_bool(DOWNCAST_NS_TIMESTAMP_TO_US_ON_WRITE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check the format version for downcasting? (We have the table_metadata already, so we have access to it)

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as my previous comment has been fixed

@kevinjqliu
Copy link
Contributor

would be great to add an integration test showing spark writing V3 with nanos and reading with pyiceberg. We can do that as a follow up

@kevinjqliu kevinjqliu merged commit 52ff684 into apache:main Aug 28, 2025
10 checks passed
@kevinjqliu
Copy link
Contributor

Thanks for the PR @Fokko and thanks @rambleraptor and @raulcd for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants