[Feature] Diffusion Actor DDPMModule#3596
Conversation
…tioned on observations using a fixed linear-beta DDPM scheduler, following Diffusion Policy (Chi et al., RSS 2023).
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3596
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New Failures, 1 Cancelled JobAs of commit aeec61f with merge base f54a7c7 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
| Prefix | Label Applied | Example |
|---|---|---|
[BugFix] |
BugFix | [BugFix] Fix memory leak in collector |
[Feature] |
Feature | [Feature] Add new optimizer |
[Doc] or [Docs] |
Documentation | [Doc] Update installation guide |
[Refactor] |
Refactoring | [Refactor] Clean up module imports |
[CI] |
CI | [CI] Fix workflow permissions |
[Test] or [Tests] |
Tests | [Tests] Add unit tests for buffer |
[Environment] or [Environments] |
Environments | [Environments] Add Gymnasium support |
[Data] |
Data | [Data] Fix replay buffer sampling |
[Performance] or [Perf] |
Performance | [Performance] Optimize tensor ops |
[BC-Breaking] |
bc breaking | [BC-Breaking] Remove deprecated API |
[Deprecation] |
Deprecation | [Deprecation] Mark old function |
[Quality] |
Quality | [Quality] Fix typos and add codespell |
Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).
|
@vmoens could i get some feedback on it? Pretty straight-forward implementation of diffusion actor |
vmoens
left a comment
There was a problem hiding this comment.
Thanks, left a couple of comments and fixed the linter
|
@vmoens should be good to go. |
|
@vmoens I believe ci still needs to be updated |
Implements the ε-prediction denoising loss from Diffusion Policy (Chi et al., RSS 2023) as a TorchRL LossModule, completing Phase-1 of the diffusion policy feature alongside DiffusionActor (pytorch#3596). - torchrl/objectives/diffusion_bc.py: DiffusionBCLoss subclassing LossModule, uses _DDPMModule.add_noise() for the forward diffusion step and computes MSE between predicted and actual noise. Supports configurable reduction and set_keys() for observation/action key remapping. - torchrl/objectives/__init__.py: register DiffusionBCLoss in alphabetical order. - test/objectives/test_diffusion_bc.py: 17 tests covering output keys, scalar loss, backward, gradient flow, reduction modes, custom keys, and a training convergence check. - examples/diffusion_bc_pendulum.py: end-to-end BC training on Pendulum-v1 with expert data collection, training loop, and evaluation.
|
@theap06 the linter is failing |
|
@vmoens done |
vmoens
left a comment
There was a problem hiding this comment.
We still need for this to appear in the docs (in docs/) to be seen in the API reference!
Done |
Implements the ε-prediction denoising loss from Diffusion Policy (Chi et al., RSS 2023) as a TorchRL LossModule, completing Phase-1 of the diffusion policy feature alongside DiffusionActor (pytorch#3596). - torchrl/objectives/diffusion_bc.py: DiffusionBCLoss subclassing LossModule, uses _DDPMModule.add_noise() for the forward diffusion step and computes MSE between predicted and actual noise. Supports configurable reduction and set_keys() for observation/action key remapping. - torchrl/objectives/__init__.py: register DiffusionBCLoss in alphabetical order. - test/objectives/test_diffusion_bc.py: 17 tests covering output keys, scalar loss, backward, gradient flow, reduction modes, custom keys, and a training convergence check. - examples/diffusion_bc_pendulum.py: end-to-end BC training on Pendulum-v1 with expert data collection, training loop, and evaluation.
Motivation and Implementation
DDPMModule (internal nn.Module)
Fixes Phase 1 of PR #3149
Linear beta schedule (beta_start → beta_end) precomputed and registered as buffers
Full DDPM reverse chain: starts from Gaussian noise, runs num_steps denoising iterations conditioned on observation
Input to score network: [noisy_action || observation || timestep] concatenated on last dim
DiffusionActor (public, subclasses SafeModule)
TensorDict contract: in_keys=["observation"] → out_keys=["action"] (overridable)
Default score network: 2-hidden-layer MLP (256 wide, SiLU activations)
Pluggable score_network, num_steps, beta_start, beta_end, spec
Exported from torchrl.modules and torchrl.modules.tensordict_module
Tests (6 tests in TestDiffusionActor)
Output shape (batched + unbatched)
Custom in_keys/out_keys
Custom score network
Spec wrapping
Gradient flow
What's NOT in this PR (future phases):
DiffusionBCLoss (imitation learning objective)
examples/diffusion_bc_pendulum.py
DDIM / faster samplers
CNN encoder for pixel observations