Skip to content

[Performance] Add fast path for step() and TransformedEnv._step() when _trust_step_output is set#3565

Closed
vmoens wants to merge 2 commits intogh/vmoens/246/basefrom
gh/vmoens/246/head
Closed

[Performance] Add fast path for step() and TransformedEnv._step() when _trust_step_output is set#3565
vmoens wants to merge 2 commits intogh/vmoens/246/basefrom
gh/vmoens/246/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Mar 23, 2026

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3565

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures, 14 Pending

As of commit 9d0068c with merge base a4301ee (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Mar 23, 2026
…n _trust_step_output is set

When _trust_step_output is True, EnvBase.step() skips _assert_tensordict_shape,
partial_steps handling, next_preset logic, and _step_proc_data. Similarly,
TransformedEnv._step() skips partial_steps, next_preset, and _complete_done.
This eliminates all per-step Python validation overhead for well-behaved envs.

Made-with: Cursor
ghstack-source-id: 52ff860
Pull-Request: #3565
@github-actions github-actions bot added Performance Performance issue or suggestion for improvement Transforms labels Mar 23, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 80.9338μs 80.1621μs 12.4747 KOps/s 12.2956 KOps/s $\color{#35bf28}+1.46\%$
test_tensor_to_bytestream_speed[torch.save] 0.1406ms 0.1403ms 7.1287 KOps/s 6.8444 KOps/s $\color{#35bf28}+4.15\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1110s 0.1106s 9.0401 Ops/s 8.8305 Ops/s $\color{#35bf28}+2.37\%$
test_tensor_to_bytestream_speed[numpy] 2.5594μs 2.5511μs 391.9899 KOps/s 377.3300 KOps/s $\color{#35bf28}+3.89\%$
test_tensor_to_bytestream_speed[safetensors] 36.3099μs 36.1425μs 27.6683 KOps/s 26.7156 KOps/s $\color{#35bf28}+3.57\%$
test_simple 0.5480s 0.5465s 1.8300 Ops/s 1.7416 Ops/s $\textbf{\color{#35bf28}+5.07\%}$
test_transformed 1.0877s 1.0868s 0.9202 Ops/s 0.8907 Ops/s $\color{#35bf28}+3.31\%$
test_serial 1.7047s 1.7018s 0.5876 Ops/s 0.5780 Ops/s $\color{#35bf28}+1.65\%$
test_parallel 1.0276s 1.0219s 0.9786 Ops/s 0.9465 Ops/s $\color{#35bf28}+3.39\%$
test_step_mdp_speed[True-True-True-True-True] 0.2767ms 41.4369μs 24.1331 KOps/s 24.4066 KOps/s $\color{#d91a1a}-1.12\%$
test_step_mdp_speed[True-True-True-True-False] 60.5110μs 22.9578μs 43.5582 KOps/s 43.3383 KOps/s $\color{#35bf28}+0.51\%$
test_step_mdp_speed[True-True-True-False-True] 54.9210μs 24.1120μs 41.4731 KOps/s 42.5462 KOps/s $\color{#d91a1a}-2.52\%$
test_step_mdp_speed[True-True-True-False-False] 47.4410μs 12.9636μs 77.1393 KOps/s 77.6405 KOps/s $\color{#d91a1a}-0.65\%$
test_step_mdp_speed[True-True-False-True-True] 80.1710μs 44.4392μs 22.5027 KOps/s 22.6066 KOps/s $\color{#d91a1a}-0.46\%$
test_step_mdp_speed[True-True-False-True-False] 66.2110μs 25.7332μs 38.8603 KOps/s 38.3945 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[True-True-False-False-True] 55.3910μs 26.0921μs 38.3258 KOps/s 37.6782 KOps/s $\color{#35bf28}+1.72\%$
test_step_mdp_speed[True-True-False-False-False] 48.9010μs 15.4194μs 64.8532 KOps/s 63.8476 KOps/s $\color{#35bf28}+1.57\%$
test_step_mdp_speed[True-False-True-True-True] 0.1254ms 46.9472μs 21.3005 KOps/s 21.0892 KOps/s $\color{#35bf28}+1.00\%$
test_step_mdp_speed[True-False-True-True-False] 64.0710μs 28.2316μs 35.4213 KOps/s 35.2619 KOps/s $\color{#35bf28}+0.45\%$
test_step_mdp_speed[True-False-True-False-True] 0.4400ms 26.3321μs 37.9765 KOps/s 38.1030 KOps/s $\color{#d91a1a}-0.33\%$
test_step_mdp_speed[True-False-True-False-False] 0.4422ms 15.3216μs 65.2671 KOps/s 63.6939 KOps/s $\color{#35bf28}+2.47\%$
test_step_mdp_speed[True-False-False-True-True] 85.1510μs 48.9653μs 20.4226 KOps/s 20.2747 KOps/s $\color{#35bf28}+0.73\%$
test_step_mdp_speed[True-False-False-True-False] 0.4544ms 30.6681μs 32.6072 KOps/s 32.0471 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[True-False-False-False-True] 0.4797ms 28.2644μs 35.3802 KOps/s 34.2575 KOps/s $\color{#35bf28}+3.28\%$
test_step_mdp_speed[True-False-False-False-False] 0.4399ms 17.8934μs 55.8866 KOps/s 55.2015 KOps/s $\color{#35bf28}+1.24\%$
test_step_mdp_speed[False-True-True-True-True] 82.8110μs 47.2995μs 21.1419 KOps/s 21.1282 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[False-True-True-True-False] 0.4859ms 27.9270μs 35.8076 KOps/s 35.3326 KOps/s $\color{#35bf28}+1.34\%$
test_step_mdp_speed[False-True-True-False-True] 2.3599ms 30.3195μs 32.9821 KOps/s 32.8665 KOps/s $\color{#35bf28}+0.35\%$
test_step_mdp_speed[False-True-True-False-False] 51.9410μs 17.0904μs 58.5123 KOps/s 58.3812 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[False-True-False-True-True] 0.5042ms 49.7670μs 20.0936 KOps/s 20.5774 KOps/s $\color{#d91a1a}-2.35\%$
test_step_mdp_speed[False-True-False-True-False] 0.4542ms 30.9437μs 32.3168 KOps/s 32.3424 KOps/s $\color{#d91a1a}-0.08\%$
test_step_mdp_speed[False-True-False-False-True] 0.4657ms 31.9049μs 31.3432 KOps/s 31.7038 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[False-True-False-False-False] 47.7610μs 19.5467μs 51.1597 KOps/s 51.3692 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[False-False-True-True-True] 0.4878ms 52.9565μs 18.8834 KOps/s 19.1776 KOps/s $\color{#d91a1a}-1.53\%$
test_step_mdp_speed[False-False-True-True-False] 0.4766ms 33.2106μs 30.1109 KOps/s 29.6786 KOps/s $\color{#35bf28}+1.46\%$
test_step_mdp_speed[False-False-True-False-True] 0.4592ms 31.6724μs 31.5732 KOps/s 30.8979 KOps/s $\color{#35bf28}+2.19\%$
test_step_mdp_speed[False-False-True-False-False] 47.0700μs 19.4859μs 51.3192 KOps/s 50.6200 KOps/s $\color{#35bf28}+1.38\%$
test_step_mdp_speed[False-False-False-True-True] 0.4813ms 53.2620μs 18.7751 KOps/s 18.7571 KOps/s $\color{#35bf28}+0.10\%$
test_step_mdp_speed[False-False-False-True-False] 0.4550ms 35.5916μs 28.0965 KOps/s 27.7295 KOps/s $\color{#35bf28}+1.32\%$
test_step_mdp_speed[False-False-False-False-True] 0.4573ms 33.4630μs 29.8837 KOps/s 29.4384 KOps/s $\color{#35bf28}+1.51\%$
test_step_mdp_speed[False-False-False-False-False] 53.1110μs 22.0182μs 45.4169 KOps/s 44.8066 KOps/s $\color{#35bf28}+1.36\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7271s 0.7218s 1.3854 Ops/s 1.3306 Ops/s $\color{#35bf28}+4.11\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7093s 0.6021s 1.6607 Ops/s 1.6293 Ops/s $\color{#35bf28}+1.93\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7388s 1.6483s 0.6067 Ops/s 0.6079 Ops/s $\color{#d91a1a}-0.20\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5108s 1.4274s 0.7006 Ops/s 0.7000 Ops/s $\color{#35bf28}+0.08\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9866s 1.9028s 0.5255 Ops/s 0.5263 Ops/s $\color{#d91a1a}-0.13\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7742s 1.6831s 0.5942 Ops/s 0.5965 Ops/s $\color{#d91a1a}-0.40\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6946s 4.6115s 0.2169 Ops/s 0.2171 Ops/s $\color{#d91a1a}-0.09\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.6212s 4.3860s 0.2280 Ops/s 0.2261 Ops/s $\color{#35bf28}+0.85\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9662s 1.8619s 0.5371 Ops/s 0.5298 Ops/s $\color{#35bf28}+1.38\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.8223s 1.6428s 0.6087 Ops/s 0.6249 Ops/s $\color{#d91a1a}-2.58\%$
test_values[generalized_advantage_estimate-True-True] 10.1399ms 9.9036ms 100.9738 Ops/s 99.0387 Ops/s $\color{#35bf28}+1.95\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.7717ms 17.3217ms 57.7312 Ops/s 56.7647 Ops/s $\color{#35bf28}+1.70\%$
test_values[td0_return_estimate-False-False] 0.2247ms 0.1310ms 7.6344 KOps/s 7.6365 KOps/s $\color{#d91a1a}-0.03\%$
test_values[td1_return_estimate-False-False] 27.4345ms 27.0658ms 36.9469 Ops/s 35.8437 Ops/s $\color{#35bf28}+3.08\%$
test_values[vec_td1_return_estimate-False-False] 17.8404ms 17.3914ms 57.4997 Ops/s 56.3738 Ops/s $\color{#35bf28}+2.00\%$
test_values[td_lambda_return_estimate-True-False] 41.0136ms 40.2998ms 24.8140 Ops/s 24.1669 Ops/s $\color{#35bf28}+2.68\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.6377ms 17.3831ms 57.5271 Ops/s 56.9047 Ops/s $\color{#35bf28}+1.09\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.8322ms 8.7444ms 114.3584 Ops/s 112.0286 Ops/s $\color{#35bf28}+2.08\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7995ms 1.5411ms 648.8800 Ops/s 658.7524 Ops/s $\color{#d91a1a}-1.50\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5717ms 0.4202ms 2.3797 KOps/s 2.3694 KOps/s $\color{#35bf28}+0.43\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 30.9745ms 30.2361ms 33.0731 Ops/s 29.2513 Ops/s $\textbf{\color{#35bf28}+13.07\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.9591ms 1.7380ms 575.3800 Ops/s 571.8212 Ops/s $\color{#35bf28}+0.62\%$
test_dqn_speed[False-None] 1.5132ms 1.4254ms 701.5362 Ops/s 704.4545 Ops/s $\color{#d91a1a}-0.41\%$
test_dqn_speed[False-backward] 2.0117ms 1.9631ms 509.4105 Ops/s 514.0191 Ops/s $\color{#d91a1a}-0.90\%$
test_dqn_speed[True-None] 1.5108ms 0.5892ms 1.6971 KOps/s 1.7218 KOps/s $\color{#d91a1a}-1.43\%$
test_dqn_speed[True-backward] 1.0898ms 1.0597ms 943.6410 Ops/s 933.6941 Ops/s $\color{#35bf28}+1.07\%$
test_dqn_speed[reduce-overhead-None] 0.6525ms 0.5625ms 1.7779 KOps/s 1.7652 KOps/s $\color{#35bf28}+0.72\%$
test_ddpg_speed[False-None] 3.2581ms 2.9217ms 342.2625 Ops/s 348.5446 Ops/s $\color{#d91a1a}-1.80\%$
test_ddpg_speed[False-backward] 4.3874ms 4.1820ms 239.1190 Ops/s 242.2997 Ops/s $\color{#d91a1a}-1.31\%$
test_ddpg_speed[True-None] 1.6970ms 1.4785ms 676.3748 Ops/s 658.2724 Ops/s $\color{#35bf28}+2.75\%$
test_ddpg_speed[True-backward] 2.6774ms 2.5465ms 392.6923 Ops/s 351.0471 Ops/s $\textbf{\color{#35bf28}+11.86\%}$
test_ddpg_speed[reduce-overhead-None] 1.6139ms 1.4696ms 680.4757 Ops/s 682.7995 Ops/s $\color{#d91a1a}-0.34\%$
test_sac_speed[False-None] 8.8179ms 8.2455ms 121.2785 Ops/s 123.0920 Ops/s $\color{#d91a1a}-1.47\%$
test_sac_speed[False-backward] 12.2318ms 11.6088ms 86.1414 Ops/s 87.0367 Ops/s $\color{#d91a1a}-1.03\%$
test_sac_speed[True-None] 2.5001ms 2.2835ms 437.9206 Ops/s 430.9479 Ops/s $\color{#35bf28}+1.62\%$
test_sac_speed[True-backward] 4.4618ms 4.3055ms 232.2586 Ops/s 224.5339 Ops/s $\color{#35bf28}+3.44\%$
test_sac_speed[reduce-overhead-None] 2.3847ms 2.2734ms 439.8750 Ops/s 455.2801 Ops/s $\color{#d91a1a}-3.38\%$
test_redq_speed[False-None] 14.0315ms 10.9985ms 90.9213 Ops/s 89.2160 Ops/s $\color{#35bf28}+1.91\%$
test_redq_speed[False-backward] 19.8768ms 18.8356ms 53.0911 Ops/s 54.3606 Ops/s $\color{#d91a1a}-2.34\%$
test_redq_speed[True-None] 5.1877ms 4.8186ms 207.5294 Ops/s 204.2387 Ops/s $\color{#35bf28}+1.61\%$
test_redq_speed[reduce-overhead-None] 5.0047ms 4.7204ms 211.8486 Ops/s 208.1884 Ops/s $\color{#35bf28}+1.76\%$
test_redq_deprec_speed[False-None] 12.1804ms 11.6565ms 85.7888 Ops/s 87.3498 Ops/s $\color{#d91a1a}-1.79\%$
test_redq_deprec_speed[False-backward] 17.2985ms 16.8107ms 59.4861 Ops/s 60.9095 Ops/s $\color{#d91a1a}-2.34\%$
test_redq_deprec_speed[True-None] 4.0532ms 3.8426ms 260.2392 Ops/s 263.9104 Ops/s $\color{#d91a1a}-1.39\%$
test_redq_deprec_speed[True-backward] 8.1226ms 7.8725ms 127.0249 Ops/s 123.9510 Ops/s $\color{#35bf28}+2.48\%$
test_redq_deprec_speed[reduce-overhead-None] 3.9720ms 3.7339ms 267.8192 Ops/s 265.2092 Ops/s $\color{#35bf28}+0.98\%$
test_td3_speed[False-None] 8.3588ms 8.2873ms 120.6668 Ops/s 121.4168 Ops/s $\color{#d91a1a}-0.62\%$
test_td3_speed[False-backward] 11.4930ms 11.2102ms 89.2044 Ops/s 89.3368 Ops/s $\color{#d91a1a}-0.15\%$
test_td3_speed[True-None] 1.9784ms 1.9308ms 517.9262 Ops/s 505.6019 Ops/s $\color{#35bf28}+2.44\%$
test_td3_speed[True-backward] 4.0355ms 3.8150ms 262.1249 Ops/s 216.5542 Ops/s $\textbf{\color{#35bf28}+21.04\%}$
test_td3_speed[reduce-overhead-None] 1.9424ms 1.8924ms 528.4287 Ops/s 526.6660 Ops/s $\color{#35bf28}+0.33\%$
test_cql_speed[False-None] 30.7688ms 27.5771ms 36.2620 Ops/s 37.0908 Ops/s $\color{#d91a1a}-2.23\%$
test_cql_speed[False-backward] 41.7916ms 37.2787ms 26.8250 Ops/s 26.7935 Ops/s $\color{#35bf28}+0.12\%$
test_cql_speed[True-None] 13.3051ms 12.9644ms 77.1342 Ops/s 76.4616 Ops/s $\color{#35bf28}+0.88\%$
test_cql_speed[True-backward] 19.4837ms 19.1105ms 52.3272 Ops/s 51.6523 Ops/s $\color{#35bf28}+1.31\%$
test_cql_speed[reduce-overhead-None] 16.0954ms 13.1013ms 76.3285 Ops/s 68.9470 Ops/s $\textbf{\color{#35bf28}+10.71\%}$
test_a2c_speed[False-None] 5.8228ms 5.6079ms 178.3185 Ops/s 180.2399 Ops/s $\color{#d91a1a}-1.07\%$
test_a2c_speed[False-backward] 12.5522ms 12.2502ms 81.6314 Ops/s 81.9740 Ops/s $\color{#d91a1a}-0.42\%$
test_a2c_speed[True-None] 4.3954ms 3.9245ms 254.8111 Ops/s 247.6531 Ops/s $\color{#35bf28}+2.89\%$
test_a2c_speed[True-backward] 9.3278ms 9.0357ms 110.6723 Ops/s 107.3296 Ops/s $\color{#35bf28}+3.11\%$
test_a2c_speed[reduce-overhead-None] 4.1160ms 3.9281ms 254.5736 Ops/s 249.1491 Ops/s $\color{#35bf28}+2.18\%$
test_ppo_speed[False-None] 6.2941ms 6.1071ms 163.7450 Ops/s 166.1772 Ops/s $\color{#d91a1a}-1.46\%$
test_ppo_speed[False-backward] 13.2707ms 12.9768ms 77.0607 Ops/s 77.2416 Ops/s $\color{#d91a1a}-0.23\%$
test_ppo_speed[True-None] 4.2258ms 3.9499ms 253.1739 Ops/s 257.4699 Ops/s $\color{#d91a1a}-1.67\%$
test_ppo_speed[True-backward] 9.3109ms 8.9068ms 112.2737 Ops/s 110.3913 Ops/s $\color{#35bf28}+1.71\%$
test_ppo_speed[reduce-overhead-None] 4.3076ms 3.9018ms 256.2917 Ops/s 256.6729 Ops/s $\color{#d91a1a}-0.15\%$
test_reinforce_speed[False-None] 5.1473ms 4.8013ms 208.2760 Ops/s 208.9556 Ops/s $\color{#d91a1a}-0.33\%$
test_reinforce_speed[False-backward] 7.9667ms 7.7402ms 129.1954 Ops/s 129.3866 Ops/s $\color{#d91a1a}-0.15\%$
test_reinforce_speed[True-None] 3.4667ms 3.1146ms 321.0712 Ops/s 265.2638 Ops/s $\textbf{\color{#35bf28}+21.04\%}$
test_reinforce_speed[True-backward] 8.5008ms 8.2271ms 121.5497 Ops/s 117.1607 Ops/s $\color{#35bf28}+3.75\%$
test_reinforce_speed[reduce-overhead-None] 3.4862ms 3.0762ms 325.0784 Ops/s 319.8975 Ops/s $\color{#35bf28}+1.62\%$
test_iql_speed[False-None] 26.3882ms 21.3990ms 46.7311 Ops/s 47.8956 Ops/s $\color{#d91a1a}-2.43\%$
test_iql_speed[False-backward] 33.3263ms 31.4325ms 31.8142 Ops/s 31.6477 Ops/s $\color{#35bf28}+0.53\%$
test_iql_speed[True-None] 11.3674ms 8.9374ms 111.8896 Ops/s 111.2392 Ops/s $\color{#35bf28}+0.58\%$
test_iql_speed[True-backward] 17.7127ms 17.1037ms 58.4670 Ops/s 56.7252 Ops/s $\color{#35bf28}+3.07\%$
test_iql_speed[reduce-overhead-None] 9.1135ms 8.8301ms 113.2487 Ops/s 112.6218 Ops/s $\color{#35bf28}+0.56\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2141ms 6.0352ms 165.6938 Ops/s 161.1386 Ops/s $\color{#35bf28}+2.83\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.0941ms 0.3407ms 2.9348 KOps/s 3.0946 KOps/s $\textbf{\color{#d91a1a}-5.16\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5519ms 0.2900ms 3.4479 KOps/s 3.5409 KOps/s $\color{#d91a1a}-2.63\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1041ms 5.8821ms 170.0069 Ops/s 168.9455 Ops/s $\color{#35bf28}+0.63\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0676ms 0.3545ms 2.8205 KOps/s 2.8262 KOps/s $\color{#d91a1a}-0.20\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5756ms 0.3216ms 3.1093 KOps/s 2.9551 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6887ms 1.4130ms 707.7367 Ops/s 680.1547 Ops/s $\color{#35bf28}+4.06\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6230ms 1.3067ms 765.2746 Ops/s 730.6633 Ops/s $\color{#35bf28}+4.74\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.8197ms 6.0827ms 164.3995 Ops/s 163.1147 Ops/s $\color{#35bf28}+0.79\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.9353ms 0.4798ms 2.0840 KOps/s 1.9357 KOps/s $\textbf{\color{#35bf28}+7.66\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8391ms 0.5474ms 1.8268 KOps/s 2.1320 KOps/s $\textbf{\color{#d91a1a}-14.32\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9180ms 5.8022ms 172.3475 Ops/s 166.8815 Ops/s $\color{#35bf28}+3.28\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.2799ms 0.3677ms 2.7196 KOps/s 2.5240 KOps/s $\textbf{\color{#35bf28}+7.75\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5485ms 0.3504ms 2.8542 KOps/s 3.5804 KOps/s $\textbf{\color{#d91a1a}-20.28\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0030ms 5.7879ms 172.7749 Ops/s 169.1829 Ops/s $\color{#35bf28}+2.12\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.2542ms 0.3645ms 2.7431 KOps/s 2.9906 KOps/s $\textbf{\color{#d91a1a}-8.27\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6082ms 0.3444ms 2.9038 KOps/s 2.8190 KOps/s $\color{#35bf28}+3.01\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.0953ms 5.9702ms 167.4987 Ops/s 163.0425 Ops/s $\color{#35bf28}+2.73\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0298ms 0.5097ms 1.9620 KOps/s 2.1730 KOps/s $\textbf{\color{#d91a1a}-9.71\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7148ms 0.4979ms 2.0084 KOps/s 2.2829 KOps/s $\textbf{\color{#d91a1a}-12.02\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4805ms 5.0682ms 197.3090 Ops/s 193.0917 Ops/s $\color{#35bf28}+2.18\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.8217ms 2.1551ms 464.0167 Ops/s 448.1267 Ops/s $\color{#35bf28}+3.55\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 8.4718ms 1.3851ms 721.9862 Ops/s 1.1085 KOps/s $\textbf{\color{#d91a1a}-34.87\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6354s 17.7107ms 56.4630 Ops/s 49.4588 Ops/s $\textbf{\color{#35bf28}+14.16\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 11.4602ms 2.0507ms 487.6361 Ops/s 493.2001 Ops/s $\color{#d91a1a}-1.13\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.0584ms 1.2370ms 808.4148 Ops/s 1.0505 KOps/s $\textbf{\color{#d91a1a}-23.04\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.9054ms 5.2877ms 189.1190 Ops/s 185.2036 Ops/s $\color{#35bf28}+2.11\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 14.1564ms 2.1366ms 468.0318 Ops/s 507.9810 Ops/s $\textbf{\color{#d91a1a}-7.86\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.3046ms 1.0833ms 923.0870 Ops/s 882.6640 Ops/s $\color{#35bf28}+4.58\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 44.6008ms 40.5000ms 24.6914 Ops/s 25.0072 Ops/s $\color{#d91a1a}-1.26\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.5152ms 19.0546ms 52.4809 Ops/s 54.6173 Ops/s $\color{#d91a1a}-3.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 47.1971ms 42.2458ms 23.6710 Ops/s 23.8792 Ops/s $\color{#d91a1a}-0.87\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.9636ms 19.2799ms 51.8675 Ops/s 53.0614 Ops/s $\color{#d91a1a}-2.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 45.1395ms 42.9148ms 23.3020 Ops/s 23.1557 Ops/s $\color{#35bf28}+0.63\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 0.5762s 31.7178ms 31.5281 Ops/s 49.8005 Ops/s $\textbf{\color{#d91a1a}-36.69\%}$
test_storage_write_lazystack[50-img_shape0-small] 0.8576ms 0.2307ms 4.3345 KOps/s 4.4663 KOps/s $\color{#d91a1a}-2.95\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.5775ms 1.4272ms 700.6950 Ops/s 689.8375 Ops/s $\color{#35bf28}+1.57\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.8224ms 2.3548ms 424.6664 Ops/s 415.3645 Ops/s $\color{#35bf28}+2.24\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1258ms 2.9968ms 333.6936 Ops/s 330.1909 Ops/s $\color{#35bf28}+1.06\%$
test_storage_write_contiguous[50-img_shape0-small] 0.5266ms 0.1400ms 7.1449 KOps/s 7.3776 KOps/s $\color{#d91a1a}-3.15\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3259ms 0.1971ms 5.0738 KOps/s 5.3462 KOps/s $\textbf{\color{#d91a1a}-5.09\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1661ms 1.7338ms 576.7742 Ops/s 560.1306 Ops/s $\color{#35bf28}+2.97\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4536ms 1.3227ms 756.0267 Ops/s 756.7658 Ops/s $\color{#d91a1a}-0.10\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3251ms 1.1533ms 867.0542 Ops/s 883.9678 Ops/s $\color{#d91a1a}-1.91\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.7802ms 3.6252ms 275.8506 Ops/s 277.5263 Ops/s $\color{#d91a1a}-0.60\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.1347ms 5.7792ms 173.0329 Ops/s 170.0029 Ops/s $\color{#35bf28}+1.78\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.7757ms 7.1322ms 140.2086 Ops/s 133.7130 Ops/s $\color{#35bf28}+4.86\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4274ms 0.2827ms 3.5379 KOps/s 3.5881 KOps/s $\color{#d91a1a}-1.40\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.6757ms 1.5462ms 646.7635 Ops/s 634.2417 Ops/s $\color{#35bf28}+1.97\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.9075ms 2.4533ms 407.6185 Ops/s 396.5157 Ops/s $\color{#35bf28}+2.80\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3753ms 3.1916ms 313.3262 Ops/s 307.3383 Ops/s $\color{#35bf28}+1.95\%$
test_collector_without_rb[100-img_shape0-atari] 34.1589ms 32.9072ms 30.3885 Ops/s 30.7172 Ops/s $\color{#d91a1a}-1.07\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.0794ms 64.6596ms 15.4656 Ops/s 15.5404 Ops/s $\color{#d91a1a}-0.48\%$
test_collector_with_rb[100-img_shape0-atari] 38.6810ms 37.3885ms 26.7462 Ops/s 26.6573 Ops/s $\color{#35bf28}+0.33\%$
test_collector_with_rb[200-img_shape1-large_batch] 74.7194ms 73.3297ms 13.6370 Ops/s 13.6285 Ops/s $\color{#35bf28}+0.06\%$

def _step(self, tensordict: TensorDictBase) -> TensorDictBase:
# No need to clone here because inv does it already
# tensordict = tensordict.clone(False)
if self.base_env._trust_step_output:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we guarantee that there are no partial steps and such? Isn't this a bit of a footgun?

[ghstack-poisoned]
@vmoens vmoens closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance Performance issue or suggestion for improvement Transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant