-
Notifications
You must be signed in to change notification settings - Fork 412
Feat/update sort order #2552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/update sort order #2552
Conversation
fix: set default Co-authored-by: Fokko Driesprong <fokko@apache.org>
chore: update signature Co-authored-by: Fokko Driesprong <fokko@apache.org>
…0/iceberg-python into feat/update-sort-order
…assigned id, else set a previous sort order as the default
| def asc( | ||
| self, source_column_name: str, transform: Transform[Any, Any], null_order: NullOrder = NullOrder.NULLS_LAST | ||
| ) -> UpdateSortOrder: | ||
| """Add a sort field with ascending order.""" | ||
| return self._add_sort_field( | ||
| source_id=self._column_name_to_id(source_column_name), | ||
| transform=transform, | ||
| direction=SortDirection.ASC, | ||
| null_order=null_order, | ||
| ) | ||
|
|
||
| def desc( | ||
| self, source_column_name: str, transform: Transform[Any, Any], null_order: NullOrder = NullOrder.NULLS_LAST | ||
| ) -> UpdateSortOrder: | ||
| """Add a sort field with descending order.""" | ||
| return self._add_sort_field( | ||
| source_id=self._column_name_to_id(source_column_name), | ||
| transform=transform, | ||
| direction=SortDirection.DESC, | ||
| null_order=null_order, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have these be separate methods instead of an argument/parameter for a sort field?
I guess I would rather just do
table.update_sort_order([SortField1, SortField2, ...])Ideally whatever abstraction in the above syntax also takes care of the _column_name_to_id conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think its done this way to provider a "builder" for the desired sort order, and to check against previously set sort orders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it definitely is cleaner than making some static method of the sort idea need to be aware of the schema of the table to map field ids
|
|
||
| Sort orders can only be updated by adding a new sort order. They cannot be deleted or modified. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the name of the method should be set_sort_order then, as update_sort_order possibly implies that it could be a modification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think either is fine. theres a updateSortOrderSchema function in java https://grep.app/search?f.repo=apache%2Ficeberg&q=updatesort
| ) | ||
| def test_update_sort_order(catalog: Catalog, format_version: str, table_schema_simple: Schema) -> None: | ||
| simple_table = _simple_table(catalog, table_schema_simple, format_version) | ||
| simple_table.update_sort_order().asc("foo", IdentityTransform(), NullOrder.NULLS_FIRST).desc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use context manager here? to test the auto commit path
|
Maybe we should get this in, to credit @JasperHG90's work, and you can polish the remaining item :) |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
the update_sort_order api is used to build the desired sort order. if it matches a previous sort order, the previous sort order id will be reused. otherwise, we'll create a new sort order with an incremented id
|
Thank you @JasperHG90 for working on this! Appreciate all the efforts here :) This pr sets a great foundation, we can continue to refine it. I personally want to see an integration with our spark tests |
Rationale for this change
Fixed the tests in #1500, but kept the commits so @JasperHG90 gets all the credits for the great work!
Closes #1500
Closes #1245
Are these changes tested?
Are there any user-facing changes?