Skip to content

Conversation

@XiaoHongbo-Hope
Copy link
Contributor

Purpose

Linked issue: close #6962

Tests

BinaryRowTest

API and Format

Documentation

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review January 8, 2026 02:19
@JingsongLi
Copy link
Contributor

Why Python does not produce empty _MIN_VALUES? Different to Java?

@klboke
Copy link

klboke commented Jan 8, 2026

After the repair, a new problem appeared, and the error message is as follows:

INFO:pypaimon.catalog.rest.rest_token_file_io:end refresh data token for identifier [Identifier(database='adn', object='wide_table_200cols', branch=None)] expiresAtMillis [1767884499000]
2026-01-08 11:01:44,750 - paimon_dataset.py:189 - ERROR - Error reading table using Paimon API: '_VALUE_STATS_COLS'
Traceback (most recent call last):
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/taptap_io/paimon_dataset.py", line 190, in _paimon_table_to_data_files
    raise e
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/taptap_io/paimon_dataset.py", line 178, in _paimon_table_to_data_files
    splits = scan.plan().splits()
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/table_scan.py", line 45, in plan
    return self.starting_scanner.scan()
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 77, in scan
    file_entries = self.plan_files()
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 95, in plan_files
    return self.read_manifest_entries(manifest_files)
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 102, in read_manifest_entries
    return self.manifest_file_manager.read_entries_parallel(manifest_files,
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 57, in read_entries_parallel
    for entries in future_results:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 51, in _process_single_manifest
    return self.read(manifest_file.file_name, manifest_entry_filter, drop_stats)
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 90, in read
    fields = self._get_value_stats_fields(file_dict, schema_fields)
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 134, in _get_value_stats_fields
    if file_dict['_VALUE_STATS_COLS'] is None:
KeyError: '_VALUE_STATS_COLS'

@XiaoHongbo-Hope
Copy link
Contributor Author

After the repair, a new problem appeared, and the error message is as follows:

INFO:pypaimon.catalog.rest.rest_token_file_io:end refresh data token for identifier [Identifier(database='adn', object='wide_table_200cols', branch=None)] expiresAtMillis [1767884499000]
2026-01-08 11:01:44,750 - paimon_dataset.py:189 - ERROR - Error reading table using Paimon API: '_VALUE_STATS_COLS'
Traceback (most recent call last):
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/taptap_io/paimon_dataset.py", line 190, in _paimon_table_to_data_files
    raise e
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/taptap_io/paimon_dataset.py", line 178, in _paimon_table_to_data_files
    splits = scan.plan().splits()
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/table_scan.py", line 45, in plan
    return self.starting_scanner.scan()
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 77, in scan
    file_entries = self.plan_files()
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 95, in plan_files
    return self.read_manifest_entries(manifest_files)
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/read/scanner/full_starting_scanner.py", line 102, in read_manifest_entries
    return self.manifest_file_manager.read_entries_parallel(manifest_files,
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 57, in read_entries_parallel
    for entries in future_results:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 51, in _process_single_manifest
    return self.read(manifest_file.file_name, manifest_entry_filter, drop_stats)
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 90, in read
    fields = self._get_value_stats_fields(file_dict, schema_fields)
  File "/Users/kl/worknamespace/infra-mono/pkg/tap_common_io/.venv-py/lib/python3.10/site-packages/pypaimon/manifest/manifest_file_manager.py", line 134, in _get_value_stats_fields
    if file_dict['_VALUE_STATS_COLS'] is None:
KeyError: '_VALUE_STATS_COLS'

👌

@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] Fix IndexError when reading manifest with empty _MIN_VALUES [python/hotfix] Fix IndexError when reading manifest with empty _MIN_VALUES Jan 8, 2026
@XiaoHongbo-Hope
Copy link
Contributor Author

Why Python does not produce empty _MIN_VALUES? Different to Java?

Actually, Python and Java produce the same result for empty stats - both serialize to 12 bytes, which I added an assertion to show it. I searched all the git history, did not find the root cause of the 0 bytes MIN_VALUES. The data is written by older version pypaimon too.

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft January 10, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][PyPaimon] IndexError: index out of range when reading manifest with empty _MIN_VALUES

3 participants