Skip to content

[WIP] Ray Data batch inference tutorial#3761

Open
crypdick wants to merge 29 commits intopytorch:mainfrom
crypdick:ray-data-batch-embeddings
Open

[WIP] Ray Data batch inference tutorial#3761
crypdick wants to merge 29 commits intopytorch:mainfrom
crypdick:ray-data-batch-embeddings

Conversation

@crypdick
Copy link
Contributor

@crypdick crypdick commented Feb 5, 2026

Adds a new Ray Data tutorial. I will move this out of draft after review from the Ray Data team.

cc @pcmoritz @robertnishihara @matthewdeng @richardliaw @akshay-anyscale

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3761

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the cla signed label Feb 5, 2026
@svekars svekars added the ray PRs related to tutorials that use the ray project: https://github.com/ray-project/ray label Feb 5, 2026
@sekyondaMeta sekyondaMeta self-assigned this Feb 5, 2026
@crypdick crypdick marked this pull request as draft February 5, 2026 20:54
@svekars svekars marked this pull request as ready for review February 6, 2026 17:12
@sekyondaMeta sekyondaMeta added the skip-link-check Will allow you to skip linkcheck on a PR. Should only should be used when a link can't be fixed. label Feb 6, 2026
@crypdick crypdick force-pushed the ray-data-batch-embeddings branch from aca2b1f to a4ca298 Compare February 10, 2026 07:45
@svekars
Copy link
Contributor

svekars commented Feb 10, 2026

@crypdick
Copy link
Contributor Author

@svekars Thanks for the preview.

It's so weird that the preview shows a KeyboardInterrupt:
image

I also noticed that the img = Image.fromarray(img_array); img.show() is not displaying the images. Do you know if that's possible?

@svekars
Copy link
Contributor

svekars commented Feb 11, 2026

img = Image.fromarray(img_array); img.show()

Use matplotlib.plt.imshow(img_array); plt.show() should work

# Typically, this link is
# ``http://localhost:8265``.
#
# TODO: Add screenshots of the Ray dashboard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to add the screenshot or should we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep I plan to add a screenshot. this PR isn't ready for review from pytorch yet, I'm still waiting on internal review

Comment on lines +76 to +77
# Behind the scenes, ``read_images()`` spreads the downloads across all available
# nodes, using all the network bandwidth available to the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also work on a laptop/single machine, where the download will be parallelized across processes.

# ------------------------------------------
#
# For batch inference, wrap the model in a class. By passing a class to
# ``map_batches()``, Ray creates **Actor** processes that recycle state between
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users won't know what a Actor is - maybe just say:
Ray Data will spawn different copies of this class on different processes across the cluster, called an Actor. These actors will preserve state between batches -- it loads the model upon instantiation and avoids repeating model download overhead by keeping it warm for subsequent batches

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed ray PRs related to tutorials that use the ray project: https://github.com/ray-project/ray skip-link-check Will allow you to skip linkcheck on a PR. Should only should be used when a link can't be fixed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants