Skip to content

Conversation

@discivigour
Copy link
Contributor

Purpose

Support read paimon table as pytorch dataset

Tests

  • torch_read_test.py

API and Format

TableRead.to_torch: Convert Paimon table data to pytorch Dataset.

Documentation

updated python-api.md

@discivigour discivigour marked this pull request as ready for review January 9, 2026 09:06
from torch.utils.data import DataLoader

table_read = read_builder.new_read()
dataset = table_read.to_torch(splits, streaming=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we find a reference for this parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return self._data[index]

def _load_data(self):
Copy link
Contributor

@JingsongLi JingsongLi Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline this to __init__

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@JingsongLi
Copy link
Contributor

+1

@JingsongLi JingsongLi merged commit da8e246 into apache:master Jan 9, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants