[WIP] Ray Data batch inference tutorial#3761
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3761
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
aca2b1f to
a4ca298
Compare
|
@svekars Thanks for the preview. It's so weird that the preview shows a I also noticed that the |
Use |
| # Typically, this link is | ||
| # ``http://localhost:8265``. | ||
| # | ||
| # TODO: Add screenshots of the Ray dashboard. |
There was a problem hiding this comment.
do you want to add the screenshot or should we remove this?
There was a problem hiding this comment.
yep I plan to add a screenshot. this PR isn't ready for review from pytorch yet, I'm still waiting on internal review
| # Behind the scenes, ``read_images()`` spreads the downloads across all available | ||
| # nodes, using all the network bandwidth available to the cluster. |
There was a problem hiding this comment.
This will also work on a laptop/single machine, where the download will be parallelized across processes.
| # ------------------------------------------ | ||
| # | ||
| # For batch inference, wrap the model in a class. By passing a class to | ||
| # ``map_batches()``, Ray creates **Actor** processes that recycle state between |
There was a problem hiding this comment.
users won't know what a Actor is - maybe just say:
Ray Data will spawn different copies of this class on different processes across the cluster, called an Actor. These actors will preserve state between batches -- it loads the model upon instantiation and avoids repeating model download overhead by keeping it warm for subsequent batches
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

Adds a new Ray Data tutorial. I will move this out of draft after review from the Ray Data team.
cc @pcmoritz @robertnishihara @matthewdeng @richardliaw @akshay-anyscale