-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[python/hotfix] Fix IndexError when reading manifest with empty _MIN_VALUES #6971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Why Python does not produce empty _MIN_VALUES? Different to Java? |
|
After the repair, a new problem appeared, and the error message is as follows: INFO:pypaimon.catalog.rest.rest_token_file_io:end refresh data token for identifier [Identifier(database='adn', object='wide_table_200cols', branch=None)] expiresAtMillis [1767884499000]
2026-01-08 11:01:44,750 - paimon_dataset.py:189 - ERROR - Error reading table using Paimon API: '_VALUE_STATS_COLS'
Traceback (most recent call last):
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/taptap_io/paimon_dataset.py", line 190, in _paimon_table_to_data_files
raise e
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/taptap_io/paimon_dataset.py", line 178, in _paimon_table_to_data_files
splits = scan.plan().splits()
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/table_scan.py", line 45, in plan
return self.starting_scanner.scan()
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 77, in scan
file_entries = self.plan_files()
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 95, in plan_files
return self.read_manifest_entries(manifest_files)
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 102, in read_manifest_entries
return self.manifest_file_manager.read_entries_parallel(manifest_files,
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 57, in read_entries_parallel
for entries in future_results:
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 51, in _process_single_manifest
return self.read(manifest_file.file_name, manifest_entry_filter, drop_stats)
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 90, in read
fields = self._get_value_stats_fields(file_dict, schema_fields)
File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 134, in _get_value_stats_fields
if file_dict['_VALUE_STATS_COLS'] is None:
KeyError: '_VALUE_STATS_COLS' |
👌 |
Actually, Python and Java produce the same result for empty stats - both serialize to 12 bytes, which I added an assertion to show it. I searched all the git history, did not find the root cause of the 0 bytes MIN_VALUES. The data is written by older version pypaimon too. |
Purpose
Linked issue: close #6962
Tests
BinaryRowTestAPI and Format
Documentation