Skip to content

[Feature] Diffusion Actor DDPMModule#3596

Merged
vmoens merged 8 commits intopytorch:mainfrom
theap06:feat/diffusion-actor
Apr 9, 2026
Merged

[Feature] Diffusion Actor DDPMModule#3596
vmoens merged 8 commits intopytorch:mainfrom
theap06:feat/diffusion-actor

Conversation

@theap06
Copy link
Copy Markdown
Contributor

@theap06 theap06 commented Apr 5, 2026

Motivation and Implementation

DDPMModule (internal nn.Module)

Fixes Phase 1 of PR #3149

Linear beta schedule (beta_start → beta_end) precomputed and registered as buffers
Full DDPM reverse chain: starts from Gaussian noise, runs num_steps denoising iterations conditioned on observation
Input to score network: [noisy_action || observation || timestep] concatenated on last dim
DiffusionActor (public, subclasses SafeModule)

TensorDict contract: in_keys=["observation"] → out_keys=["action"] (overridable)
Default score network: 2-hidden-layer MLP (256 wide, SiLU activations)
Pluggable score_network, num_steps, beta_start, beta_end, spec
Exported from torchrl.modules and torchrl.modules.tensordict_module
Tests (6 tests in TestDiffusionActor)

Output shape (batched + unbatched)
Custom in_keys/out_keys
Custom score network
Spec wrapping
Gradient flow
What's NOT in this PR (future phases):

DiffusionBCLoss (imitation learning objective)
examples/diffusion_bc_pendulum.py
DDIM / faster samplers
CNN encoder for pixel observations

…tioned

on observations using a fixed linear-beta DDPM scheduler, following
Diffusion Policy (Chi et al., RSS 2023).
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3596

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 1 Cancelled Job

As of commit aeec61f with merge base f54a7c7 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

⚠️ PR Title Label Error

PR title must start with a label prefix in brackets (e.g., [BugFix]).

Current title: Implements a diffusion-based actor that denoises latent actions condi…

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix Label Applied Example
[BugFix] BugFix [BugFix] Fix memory leak in collector
[Feature] Feature [Feature] Add new optimizer
[Doc] or [Docs] Documentation [Doc] Update installation guide
[Refactor] Refactoring [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Tests [Tests] Add unit tests for buffer
[Environment] or [Environments] Environments [Environments] Add Gymnasium support
[Data] Data [Data] Fix replay buffer sampling
[Performance] or [Perf] Performance [Performance] Optimize tensor ops
[BC-Breaking] bc breaking [BC-Breaking] Remove deprecated API
[Deprecation] Deprecation [Deprecation] Mark old function
[Quality] Quality [Quality] Fix typos and add codespell

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

@theap06 theap06 changed the title Implements a diffusion-based actor that denoises latent actions condi… [Feature] Diffusion Actor DDPMModule Apr 5, 2026
@github-actions github-actions bot added the Feature New feature label Apr 5, 2026
@theap06
Copy link
Copy Markdown
Contributor Author

theap06 commented Apr 6, 2026

@vmoens could i get some feedback on it? Pretty straight-forward implementation of diffusion actor

Copy link
Copy Markdown
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a couple of comments and fixed the linter

Comment thread torchrl/modules/tensordict_module/actors.py
Comment thread torchrl/modules/tensordict_module/actors.py Outdated
Comment thread torchrl/modules/tensordict_module/actors.py
Comment thread torchrl/modules/tensordict_module/actors.py Outdated
Comment thread torchrl/modules/tensordict_module/actors.py
Comment thread torchrl/modules/tensordict_module/__init__.py Outdated
Comment thread torchrl/modules/tensordict_module/actors.py Outdated
@theap06
Copy link
Copy Markdown
Contributor Author

theap06 commented Apr 8, 2026

@vmoens should be good to go.

@theap06 theap06 requested a review from vmoens April 8, 2026 09:48
Copy link
Copy Markdown
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@theap06 theap06 requested a review from vmoens April 8, 2026 14:34
@theap06
Copy link
Copy Markdown
Contributor Author

theap06 commented Apr 8, 2026

@vmoens I believe ci still needs to be updated

theap06 added a commit to theap06/rl that referenced this pull request Apr 8, 2026
Implements the ε-prediction denoising loss from Diffusion Policy (Chi et al.,
RSS 2023) as a TorchRL LossModule, completing Phase-1 of the diffusion policy
feature alongside DiffusionActor (pytorch#3596).

- torchrl/objectives/diffusion_bc.py: DiffusionBCLoss subclassing LossModule,
  uses _DDPMModule.add_noise() for the forward diffusion step and computes
  MSE between predicted and actual noise. Supports configurable reduction and
  set_keys() for observation/action key remapping.
- torchrl/objectives/__init__.py: register DiffusionBCLoss in alphabetical order.
- test/objectives/test_diffusion_bc.py: 17 tests covering output keys, scalar
  loss, backward, gradient flow, reduction modes, custom keys, and a training
  convergence check.
- examples/diffusion_bc_pendulum.py: end-to-end BC training on Pendulum-v1
  with expert data collection, training loop, and evaluation.
@theap06 theap06 mentioned this pull request Apr 8, 2026
3 tasks
@vmoens
Copy link
Copy Markdown
Collaborator

vmoens commented Apr 8, 2026

@theap06 the linter is failing

@theap06
Copy link
Copy Markdown
Contributor Author

theap06 commented Apr 8, 2026

@vmoens done

Copy link
Copy Markdown
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need for this to appear in the docs (in docs/) to be seen in the API reference!

@github-actions github-actions bot added the Documentation Improvements or additions to documentation label Apr 9, 2026
@theap06
Copy link
Copy Markdown
Contributor Author

theap06 commented Apr 9, 2026

We still need for this to appear in the docs (in docs/) to be seen in the API reference!

Done

@theap06 theap06 requested a review from vmoens April 9, 2026 21:39
@vmoens vmoens merged commit b5b92de into pytorch:main Apr 9, 2026
99 of 103 checks passed
vmoens pushed a commit to theap06/rl that referenced this pull request Apr 10, 2026
Implements the ε-prediction denoising loss from Diffusion Policy (Chi et al.,
RSS 2023) as a TorchRL LossModule, completing Phase-1 of the diffusion policy
feature alongside DiffusionActor (pytorch#3596).

- torchrl/objectives/diffusion_bc.py: DiffusionBCLoss subclassing LossModule,
  uses _DDPMModule.add_noise() for the forward diffusion step and computes
  MSE between predicted and actual noise. Supports configurable reduction and
  set_keys() for observation/action key remapping.
- torchrl/objectives/__init__.py: register DiffusionBCLoss in alphabetical order.
- test/objectives/test_diffusion_bc.py: 17 tests covering output keys, scalar
  loss, backward, gradient flow, reduction modes, custom keys, and a training
  convergence check.
- examples/diffusion_bc_pendulum.py: end-to-end BC training on Pendulum-v1
  with expert data collection, training loop, and evaluation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Documentation Improvements or additions to documentation Feature New feature Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants