Skip to content

Conversation

@jayceslesar
Copy link
Contributor

Rationale for this change

We want to support pagination and we need to be able to return iterators to do so.

See #2158 for more context

Are these changes tested?

Existing tests were modified

Are there any user-facing changes?

yes, this is user facing and breaking but overall an improvement.

Copy link
Contributor

@rambleraptor rambleraptor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Thanks for doing it. I've got one code question and then a versioning question:

Do we need to wait for 1.0 to change these APIs? I'm hoping the answer will be no.

Comment on lines 575 to 576
if tables := list(self.list_tables(namespace)):
raise NamespaceNotEmptyError(f"Namespace {namespace_str} is not empty. {len(list(tables))} tables exist.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we (eventually) handle pagination tokens, I believe this will cause the process to consume the entire paginated list. If it's a long list, that'll require a lot of server calls to fetch all of the tables.

If that's true, what if we did this instead?

Suggested change
if tables := list(self.list_tables(namespace)):
raise NamespaceNotEmptyError(f"Namespace {namespace_str} is not empty. {len(list(tables))} tables exist.")
tables = self.list_tables(namespace)
try:
next(tables)
raise NamespaceNotEmptyError(f"Namespace {namespace_str} is not empty. {len(list(tables))} tables exist.")
except StopIteration:
pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, maybe it makes the most sense to make a little helper function here for if a namespace is empty or not? can do that in a different MR but i dont know if thats an official method upstream https://github.com/search?q=repo%3Aapache%2Ficeberg%20namespacenotempty&type=code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this makes sense, was thinking could consolidate logic across catalogs but not sure if that makes sense given the implementation differences

@jayceslesar
Copy link
Contributor Author

Just brought this up to date if you want to give another look! I am going to take a look at your existing comment later!

@jayceslesar
Copy link
Contributor Author

Still need to do some work here -- brought in a lot of those formatting changes and havent looked in a while hahahaha

# Hierarchical namespace is not supported. Return an empty list
if namespace:
return []
return iter([])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best not to use iter when not needed. In fact, we only need it for the list-tables of the REST Catalog. Does mypy allow to use list here? Same for line 748

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy says

  pyiceberg/catalog/hive.py:746: error: Incompatible return value type (got "list[Never]", expected "Iterator[tuple[str, ...]]")  [return-value]
  pyiceberg/catalog/hive.py:746: note: "list" is missing following "Iterator" protocol member:
  pyiceberg/catalog/hive.py:746: note:     __next__
  Found 1 error in 1 file (checked 166 source files)

Copy link
Contributor

@Fokko Fokko Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The reason I ask is because iter doesn't allow for len:

Python 3.14.1 (main, Dec  2 2025, 12:51:37) [Clang 17.0.0 (clang-1700.4.4.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> len(iter([1,2,3]))
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    len(iter([1,2,3]))
    ~~~^^^^^^^^^^^^^^^
TypeError: object of type 'list_iterator' has no len()
>>> len([1,2,3])
3

@jayceslesar
Copy link
Contributor Author

jayceslesar commented Dec 5, 2025

I feel like we should hold off on merging this, and additionally hold off on a change I want to make for the inspect tables to also be lazy until we are ready for a 1.0.0? Wdyt? I don't think it's wise to just release a breaking change on this interface without some form of warning.

Happy to keep this periodically up to date if we want to go that route

@Fokko Fokko removed this from the PyIceberg 0.11.0 milestone Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants