-
Notifications
You must be signed in to change notification settings - Fork 412
Modify list_* methods in catalogs to return Iterators
#2172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Modify list_* methods in catalogs to return Iterators
#2172
Conversation
rambleraptor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! Thanks for doing it. I've got one code question and then a versioning question:
Do we need to wait for 1.0 to change these APIs? I'm hoping the answer will be no.
pyiceberg/catalog/sql.py
Outdated
| if tables := list(self.list_tables(namespace)): | ||
| raise NamespaceNotEmptyError(f"Namespace {namespace_str} is not empty. {len(list(tables))} tables exist.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we (eventually) handle pagination tokens, I believe this will cause the process to consume the entire paginated list. If it's a long list, that'll require a lot of server calls to fetch all of the tables.
If that's true, what if we did this instead?
| if tables := list(self.list_tables(namespace)): | |
| raise NamespaceNotEmptyError(f"Namespace {namespace_str} is not empty. {len(list(tables))} tables exist.") | |
| tables = self.list_tables(namespace) | |
| try: | |
| next(tables) | |
| raise NamespaceNotEmptyError(f"Namespace {namespace_str} is not empty. {len(list(tables))} tables exist.") | |
| except StopIteration: | |
| pass | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, maybe it makes the most sense to make a little helper function here for if a namespace is empty or not? can do that in a different MR but i dont know if thats an official method upstream https://github.com/search?q=repo%3Aapache%2Ficeberg%20namespacenotempty&type=code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this makes sense, was thinking could consolidate logic across catalogs but not sure if that makes sense given the implementation differences
|
Just brought this up to date if you want to give another look! I am going to take a look at your existing comment later! |
|
Still need to do some work here -- brought in a lot of those formatting changes and havent looked in a while hahahaha |
| # Hierarchical namespace is not supported. Return an empty list | ||
| if namespace: | ||
| return [] | ||
| return iter([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best not to use iter when not needed. In fact, we only need it for the list-tables of the REST Catalog. Does mypy allow to use list here? Same for line 748
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy says
pyiceberg/catalog/hive.py:746: error: Incompatible return value type (got "list[Never]", expected "Iterator[tuple[str, ...]]") [return-value]
pyiceberg/catalog/hive.py:746: note: "list" is missing following "Iterator" protocol member:
pyiceberg/catalog/hive.py:746: note: __next__
Found 1 error in 1 file (checked 166 source files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The reason I ask is because iter doesn't allow for len:
Python 3.14.1 (main, Dec 2 2025, 12:51:37) [Clang 17.0.0 (clang-1700.4.4.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> len(iter([1,2,3]))
Traceback (most recent call last):
File "<python-input-0>", line 1, in <module>
len(iter([1,2,3]))
~~~^^^^^^^^^^^^^^^
TypeError: object of type 'list_iterator' has no len()
>>> len([1,2,3])
3
|
I feel like we should hold off on merging this, and additionally hold off on a change I want to make for the inspect tables to also be lazy until we are ready for a 1.0.0? Wdyt? I don't think it's wise to just release a breaking change on this interface without some form of warning. Happy to keep this periodically up to date if we want to go that route |
Rationale for this change
We want to support pagination and we need to be able to return iterators to do so.
See #2158 for more context
Are these changes tested?
Existing tests were modified
Are there any user-facing changes?
yes, this is user facing and breaking but overall an improvement.