[WIP] Ray Data batch inference tutorial#3761

Open

crypdick wants to merge 29 commits intopytorch:mainfrom

crypdick:ray-data-batch-embeddings

Contributor

crypdick commented Feb 5, 2026 •

edited

Loading

Adds a new Ray Data tutorial. I will move this out of draft after review from the Ray Data team.

cc @pcmoritz @robertnishihara @matthewdeng @richardliaw @akshay-anyscale

Ricardo Decal added 9 commits

February 2, 2026 19:34


          initial draft

ed5bce6


          2nd pass

0990b56


          add stats example

0b60c82


          increase bs, increase dset size, consolidate section

d95ffc9


          lint

371e64d


          lint

83e49af


          another pass

6c48115


          checkpoint

2cd1d0e


          edit pass

467af14

pytorch-bot bot commented Feb 5, 2026 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3761

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla bot added the cla signed label


          Merge branch 'main' into ray-data-batch-embeddings

e144d79

svekars added the ray label

sekyondaMeta self-assigned this

crypdick marked this pull request as draft

February 5, 2026 20:54


          add tutorial to docs index; use GPU in CI; add ray data logo

a4ca298

svekars marked this pull request as ready for review

February 6, 2026 17:12

sekyondaMeta added the skip-link-check label

crypdick force-pushed the ray-data-batch-embeddings branch from aca2b1f to a4ca298 Compare

February 10, 2026 07:45

Ricardo Decal added 2 commits

February 10, 2026 10:14


          add tiktoken to deps

2fe457b


          rm 4xlarge machine requirement for batch inference tutorial

dd116b9

svekars reviewed

View reviewed changes

index.rst Show resolved Hide resolved

Contributor

svekars commented Feb 10, 2026

Preview: https://docs-preview.pytorch.org/pytorch/tutorials/3761/beginner/batch_inference_tutorial.html

Ricardo Decal added 3 commits

February 10, 2026 14:55


          add batch inference tutorial to ecosystem.rst

8cea2fe


          Reduce Ray Data verbosity


          minor edits; try to fix text codefence

2866abe

Contributor Author

crypdick commented Feb 11, 2026

@svekars Thanks for the preview.

It's so weird that the preview shows a KeyboardInterrupt:

I also noticed that the img = Image.fromarray(img_array); img.show() is not displaying the images. Do you know if that's possible?

Contributor

svekars commented Feb 11, 2026

img = Image.fromarray(img_array); img.show()

Use matplotlib.plt.imshow(img_array); plt.show() should work

Ricardo Decal added 2 commits

February 11, 2026 14:17


          test plt.imshow() for displaying imgs


          test if log_to_driver=False fixes KeyboardInterrupt

a4a3664


          Merge branch 'main' into ray-data-batch-embeddings

4f9685b

svekars reviewed

View reviewed changes

.devcontainer/requirements.txt Show resolved Hide resolved

sekyondaMeta and others added 3 commits

February 17, 2026 11:25


          Merge branch 'main' into ray-data-batch-embeddings

b1523af


          Merge branch 'main' into ray-data-batch-embeddings

09d7753


          Merge branch 'main' into ray-data-batch-embeddings

5ab1e0d

svekars reviewed

View reviewed changes

beginner_source/batch_inference_tutorial.py Outdated Show resolved Hide resolved

Ricardo Decal and others added 4 commits

February 23, 2026 11:07


          revert tiktoken addition

c897218


          fix clobbered code fence

5ff48b7


          fix plot display

0f2ca72


          Merge branch 'main' into ray-data-batch-embeddings

d88889a

svekars reviewed

View reviewed changes

beginner_source/batch_inference_tutorial.py

+              # Typically, this link is
+              # ``http://localhost:8265``.
+              #
+              # TODO: Add screenshots of the Ray dashboard.

Contributor

svekars Feb 24, 2026

do you want to add the screenshot or should we remove this?

Contributor Author

crypdick Feb 24, 2026

yep I plan to add a screenshot. this PR isn't ready for review from pytorch yet, I'm still waiting on internal review


          Merge branch 'main' into ray-data-batch-embeddings

7df5b7d

richardliaw reviewed

View reviewed changes

beginner_source/batch_inference_tutorial.py

Comment on lines +76 to +77

		# Behind the scenes, ``read_images()`` spreads the downloads across all available
		# nodes, using all the network bandwidth available to the cluster.

Contributor

richardliaw Feb 25, 2026

This will also work on a laptop/single machine, where the download will be parallelized across processes.

richardliaw reviewed

View reviewed changes

beginner_source/batch_inference_tutorial.py

+              # ------------------------------------------
+              #
+              # For batch inference, wrap the model in a class. By passing a class to
+              # ``map_batches()``, Ray creates **Actor** processes that recycle state between

Contributor

richardliaw Feb 25, 2026

users won't know what a Actor is - maybe just say:
Ray Data will spawn different copies of this class on different processes across the cluster, called an Actor. These actors will preserve state between batches -- it loads the model upon instantiation and avoids repeating model download overhead by keeping it warm for subsequent batches


          remove FIXMEs

27a32dd

richardliaw reviewed

View reviewed changes

beginner_source/batch_inference_tutorial.py Outdated Show resolved Hide resolved


          Update beginner_source/batch_inference_tutorial.py

c4ae111

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed ray skip-link-check