[Feature] Diffusion Actor DDPMModule by theap06 · Pull Request #3596 · pytorch/rl

theap06 · 2026-04-05T10:11:59Z

Motivation and Implementation

DDPMModule (internal nn.Module)

Fixes Phase 1 of PR #3149

Linear beta schedule (beta_start → beta_end) precomputed and registered as buffers
Full DDPM reverse chain: starts from Gaussian noise, runs num_steps denoising iterations conditioned on observation
Input to score network: [noisy_action || observation || timestep] concatenated on last dim
DiffusionActor (public, subclasses SafeModule)

TensorDict contract: in_keys=["observation"] → out_keys=["action"] (overridable)
Default score network: 2-hidden-layer MLP (256 wide, SiLU activations)
Pluggable score_network, num_steps, beta_start, beta_end, spec
Exported from torchrl.modules and torchrl.modules.tensordict_module
Tests (6 tests in TestDiffusionActor)

Output shape (batched + unbatched)
Custom in_keys/out_keys
Custom score network
Spec wrapping
Gradient flow
What's NOT in this PR (future phases):

DiffusionBCLoss (imitation learning objective)
examples/diffusion_bc_pendulum.py
DDIM / faster samplers
CNN encoder for pixel observations

…tioned on observations using a fixed linear-beta DDPM scheduler, following Diffusion Policy (Chi et al., RSS 2023).

pytorch-bot · 2026-04-05T10:12:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3596

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull workflow for PyTorch trunk commits

❌ 3 New Failures, 1 Cancelled Job

As of commit aeec61f with merge base f54a7c7 ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Unit-tests on Linux / tests-cpu (3.10) / linux-job (gh)
test/test_distributions.py::TestDelta::test_tanhdelta_inv_ones[device0]
Unit-tests on Windows / unittests-cpu (3.10, windows.4xlarge, cpu) / windows-job (gh)
test/transforms/test_timer_video.py::TestTimer::test_transform_env

CANCELLED JOB - The following job was cancelled. Please retry:

Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-05T10:12:08Z

⚠️ PR Title Label Error

PR title must start with a label prefix in brackets (e.g., [BugFix]).

Current title: Implements a diffusion-based actor that denoises latent actions condi…

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

theap06 · 2026-04-06T22:50:08Z

@vmoens could i get some feedback on it? Pretty straight-forward implementation of diffusion actor

vmoens

Thanks, left a couple of comments and fixed the linter

… into feat/diffusion-actor

theap06 · 2026-04-08T09:19:30Z

@vmoens should be good to go.

vmoens

LGTM thanks!

theap06 · 2026-04-08T14:34:35Z

@vmoens I believe ci still needs to be updated

Implements the ε-prediction denoising loss from Diffusion Policy (Chi et al., RSS 2023) as a TorchRL LossModule, completing Phase-1 of the diffusion policy feature alongside DiffusionActor (pytorch#3596). - torchrl/objectives/diffusion_bc.py: DiffusionBCLoss subclassing LossModule, uses _DDPMModule.add_noise() for the forward diffusion step and computes MSE between predicted and actual noise. Supports configurable reduction and set_keys() for observation/action key remapping. - torchrl/objectives/__init__.py: register DiffusionBCLoss in alphabetical order. - test/objectives/test_diffusion_bc.py: 17 tests covering output keys, scalar loss, backward, gradient flow, reduction modes, custom keys, and a training convergence check. - examples/diffusion_bc_pendulum.py: end-to-end BC training on Pendulum-v1 with expert data collection, training loop, and evaluation.

vmoens · 2026-04-08T16:26:57Z

@theap06 the linter is failing

theap06 · 2026-04-08T18:08:11Z

@vmoens done

vmoens

We still need for this to appear in the docs (in docs/) to be seen in the API reference!

theap06 · 2026-04-09T21:39:34Z

We still need for this to appear in the docs (in docs/) to be seen in the API reference!

Done

Implements the ε-prediction denoising loss from Diffusion Policy (Chi et al., RSS 2023) as a TorchRL LossModule, completing Phase-1 of the diffusion policy feature alongside DiffusionActor (pytorch#3596). - torchrl/objectives/diffusion_bc.py: DiffusionBCLoss subclassing LossModule, uses _DDPMModule.add_noise() for the forward diffusion step and computes MSE between predicted and actual noise. Supports configurable reduction and set_keys() for observation/action key remapping. - torchrl/objectives/__init__.py: register DiffusionBCLoss in alphabetical order. - test/objectives/test_diffusion_bc.py: 17 tests covering output keys, scalar loss, backward, gradient flow, reduction modes, custom keys, and a training convergence check. - examples/diffusion_bc_pendulum.py: end-to-end BC training on Pendulum-v1 with expert data collection, training loop, and evaluation.

Implements a diffusion-based actor that denoises latent actions condi…

e875678

…tioned on observations using a fixed linear-beta DDPM scheduler, following Diffusion Policy (Chi et al., RSS 2023).

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 5, 2026

github-actions bot added the Modules label Apr 5, 2026

theap06 changed the title ~~Implements a diffusion-based actor that denoises latent actions condi…~~ [Feature] Diffusion Actor DDPMModule Apr 5, 2026

github-actions bot added the Feature New feature label Apr 5, 2026

Merge remote-tracking branch 'origin/main' into feat/diffusion-actor

4fac135

vmoens reviewed Apr 8, 2026

View reviewed changes

vmoens and others added 3 commits April 8, 2026 09:46

linter

bef0729

encorporated the changes for the repetitive code

b17f523

Merge branch 'feat/diffusion-actor' of https://github.com/achintya-p/rl…

15cea02

… into feat/diffusion-actor

fixed linting

ef3a480

theap06 requested a review from vmoens April 8, 2026 09:48

vmoens approved these changes Apr 8, 2026

View reviewed changes

theap06 requested a review from vmoens April 8, 2026 14:34

theap06 mentioned this pull request Apr 8, 2026

[Feature] Feat/diffusion bc loss #3604

Merged

3 tasks

fixed lint

e92d6b0

vmoens reviewed Apr 9, 2026

View reviewed changes

docs: add DiffusionActor to API reference

aeec61f

github-actions bot added the Documentation Improvements or additions to documentation label Apr 9, 2026

theap06 requested a review from vmoens April 9, 2026 21:39

vmoens approved these changes Apr 9, 2026

View reviewed changes

vmoens merged commit b5b92de into pytorch:main Apr 9, 2026
99 of 103 checks passed

Conversation

theap06 commented Apr 5, 2026

Motivation and Implementation

Uh oh!

pytorch-bot bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3596

❗ 1 Active SEVs

❌ 3 New Failures, 1 Cancelled Job

Uh oh!

github-actions bot commented Apr 5, 2026

⚠️ PR Title Label Error

Supported Prefixes (case-sensitive)

Uh oh!

theap06 commented Apr 6, 2026

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

theap06 commented Apr 8, 2026

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

theap06 commented Apr 8, 2026

Uh oh!

vmoens commented Apr 8, 2026

Uh oh!

theap06 commented Apr 8, 2026

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

theap06 commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Apr 5, 2026 •

edited

Loading