[Feature, Example] A3C Atari Implementation for TorchRL by simeetnayan81 · Pull Request #3001 · pytorch/rl

simeetnayan81 · 2025-06-15T16:24:47Z

Description

Describe your changes in detail.
This PR adds an implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm for Atari environments in the torchrl/sota-implementations directory. The main files added are:

a3c_atari.py: Contains the A3C worker class, shared optimizer, and main training loop using multiprocessing.
utils_atari.py: Provides utility functions for environment creation, model construction, and evaluation, adapted for Atari tasks.
config_atari.yaml: Configuration file for hyperparameters, environment settings, and logging.

The implementation leverages TorchRL's collectors, objectives, and logging utilities, and is designed to be modular and extensible for research and benchmarking. Some of the utils functions are also borrowed from a2c_atari.

Motivation and Context

This change is required to provide a strong, reproducible baseline for A3C on Atari environments using TorchRL. It enables researchers and practitioners to benchmark and compare reinforcement learning algorithms within the TorchRL ecosystem. The implementation follows best practices for distributed RL and is compatible with TorchRL's API.

This PR solves the issue: #1755

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

pytorch-bot · 2025-06-15T16:24:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3001

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 17 Awaiting Approval, 1 New Failure

As of commit 8e7f96c with merge base c764978 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

NEW FAILURE - The following job has failed:

PR Label / add-label (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens

This all looks pretty good!
Could you share a (couple of) learning curve?
Another thing to do before landing is to add it to the sota-implementations CI run:
https://github.com/pytorch/rl/blob/main/.github/unittest/linux_sota/scripts/test_sota.py
Make sure the config passed there is as much barebone as we can - we just want to run the script for a couple of collection / optim iters and make sure it runs without error (not that it properly trains).
We also need to add it to the sota-check runs

simeetnayan81 · 2025-06-18T04:02:09Z

Thanks @vmoens . I'll add the required changes as well as some training curves.

Merge from main

simeetnayan81 · 2025-07-06T20:02:35Z

@vmoens, I have added the required scripts as well.

Not getting enough resources and time for hyperparam tuning to generate a proper training curve.

vmoens

LGTM, just a minor comment on the logger!

vmoens · 2025-07-10T03:56:34Z

+        logger = get_logger(
+            cfg.logger.backend,
+            logger_name="a3c",
+            experiment_name=exp_name,
+            wandb_kwargs={
+                "config": dict(cfg),
+                "project": cfg.logger.project_name,
+                "group": cfg.logger.group_name,
+            },
+        )


What I usually see is that the logger is only passed to the first worker.
Another thing is that you may want to assume that the logger isn't serializable and should be instantiated locally within the worker.

Oh yea, I did that because I thought logging any single worker should be a good representative of the global model since anyway the weights are being copied. Logging all the worker might not be really useful but that can be done as well.

vmoens · 2025-07-10T04:04:54Z

+    num_workers = cfg.multiprocessing.num_workers
+
+    if num_workers is None:
+        num_workers = mp.cpu_count()


we should have way fewer workers - I think we need users to tell us how many.

That can be configured in the config_atari. You want me to explicitly set it to some constant here?

vmoens · 2025-07-10T04:22:57Z

+            data_reshape = data.reshape(-1)
+            losses = []
+
+            mini_batches = data_reshape.split(self.mini_batch_size)


To shuffle things a bit I usually rely on a replay buffer instance rather than just splitting the data

vmoens · 2025-07-10T04:28:50Z

+        for local_param, global_param in zip(
+            self.local_actor.parameters(), self.global_actor.parameters()
+        ):
+            global_param._grad = local_param.grad
+
+        for local_param, global_param in zip(
+            self.local_critic.parameters(), self.global_critic.parameters()
+        ):
+            global_param._grad = local_param.grad
+
+        gn = torch.nn.utils.clip_grad_norm_(
+            self.loss_module.parameters(), max_norm=max_grad_norm
+        )


can you explain what we do here? What do we use the _grad for?

_grad is used to store the gradients for each parameter.
We copy local gradients to the global model so the global model can be updated with the optimizer.
This is a key step in A3C, where multiple workers asynchronously update a shared global model.

vmoens · 2025-07-10T04:29:29Z

+torch.set_float32_matmul_precision("high")
+
+
+class SharedAdam(torch.optim.Adam):


shouldn't we move this to the utils file?

Sure, will do it

vmoens

I made a few edits.
Can you explain the way the params are shared and updated? I'm not sure I see the logic

simeetnayan81 · 2025-07-26T09:33:34Z

I made a few edits. Can you explain the way the params are shared and updated? I'm not sure I see the logic

There is a global model (shared across all workers) and a local model (each worker has its own copy).
The global model’s parameters are placed in shared memory so all workers can access and update them.
Each worker interacts with its own environment using its local model.
After collecting experience, the worker computes gradients (via backpropagation) on its local model.
The gradients from the local model are copied to the corresponding parameters of the global model (global_param._grad = local_param.grad).
The optimizer (e.g., Adam) is called on the global model, using the gradients just copied from the local model.
This updates the global model’s parameters.
The local model can be periodically updated (synced) from the global model to keep it up-to-date.

theap06 · 2026-04-28T02:58:09Z

+            self.loss_module.parameters(), max_norm=max_grad_norm
+        )
+
+        self.optimizer.step()


After this line, we should sync the weights from global.

github-actions · 2026-04-30T16:30:07Z

⚠️ PR Title Label Error

Unknown or invalid prefix [Feature, Example].

Current title: [Feature, Example] A3C Atari Implementation for TorchRL

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

github-actions · 2026-04-30T16:35:11Z

⚠️ PR Title Label Error

Unknown or invalid prefix [Feature, Example].

Current title: [Feature, Example] A3C Atari Implementation for TorchRL

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

Add code for A3C

5d38241

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 15, 2025

Add readme

748b673

vmoens added the new algo New algorithm request or PR label Jun 16, 2025

vmoens reviewed Jun 16, 2025

View reviewed changes

simeetnayan81 added 4 commits July 7, 2025 01:23

log only worker-0 stats

ecbec8b

Add linux test script file, and sota-check for a3c

72eea77

Merge branch 'main' into a3c-implementation

7b9ba6b

Merge from main

modify sota-check a3c

d95de87

vmoens reviewed Jul 10, 2025

View reviewed changes

simeetnayan81 added 5 commits July 10, 2025 05:03

Add code for A3C

c4184f6

Add readme

7cfa7d7

log only worker-0 stats

87ec6f3

Add linux test script file, and sota-check for a3c

bba7ba5

modify sota-check a3c

b49e35a

vmoens reviewed Jul 10, 2025

View reviewed changes

amend

a6eb18d

vmoens force-pushed the a3c-implementation branch from d95de87 to a6eb18d Compare July 10, 2025 04:32

vmoens reviewed Jul 10, 2025

View reviewed changes

simeetnayan81 and others added 3 commits July 26, 2025 15:07

Merge branch 'pytorch:main' into a3c-implementation

836f03e

Move SharedAdam to utils

19209e2

Move SharedAdam to utils

3ebad77

theap06 reviewed Apr 28, 2026

View reviewed changes

Add fix: Sync local network weights with the global network

b13eca3

github-actions Bot added CI Has to do with CI setup (e.g. wheels & builds, tests...) sota-implementations/ labels Apr 30, 2026

Merge branch 'main' into a3c-implementation

8e7f96c

		torch.set_float32_matmul_precision("high")


		class SharedAdam(torch.optim.Adam):

Conversation

simeetnayan81 commented Jun 15, 2025

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

pytorch-bot Bot commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3001

❌ 17 Awaiting Approval, 1 New Failure

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

simeetnayan81 commented Jun 18, 2025

Uh oh!

simeetnayan81 commented Jul 6, 2025

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vmoens Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

simeetnayan81 commented Jul 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 30, 2026

⚠️ PR Title Label Error

Supported Prefixes (case-sensitive)

Uh oh!

github-actions Bot commented Apr 30, 2026

⚠️ PR Title Label Error

Supported Prefixes (case-sensitive)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot Bot commented Jun 15, 2025 •

edited

Loading

vmoens Jul 10, 2025 •

edited

Loading