[Performance] Add fast path for step() and TransformedEnv._step() when _trust_step_output is set#3565
Closed
vmoens wants to merge 2 commits intogh/vmoens/246/basefrom
Closed
[Performance] Add fast path for step() and TransformedEnv._step() when _trust_step_output is set#3565vmoens wants to merge 2 commits intogh/vmoens/246/basefrom
vmoens wants to merge 2 commits intogh/vmoens/246/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3565
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 3 New Failures, 14 PendingAs of commit 9d0068c with merge base a4301ee ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
vmoens
added a commit
that referenced
this pull request
Mar 23, 2026
…n _trust_step_output is set When _trust_step_output is True, EnvBase.step() skips _assert_tensordict_shape, partial_steps handling, next_preset logic, and _step_proc_data. Similarly, TransformedEnv._step() skips partial_steps, next_preset, and _complete_done. This eliminates all per-step Python validation overhead for well-behaved envs. Made-with: Cursor ghstack-source-id: 52ff860 Pull-Request: #3565
This was referenced Mar 23, 2026
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_tensor_to_bytestream_speed[pickle] | 80.9338μs | 80.1621μs | 12.4747 KOps/s | 12.2956 KOps/s | |
| test_tensor_to_bytestream_speed[torch.save] | 0.1406ms | 0.1403ms | 7.1287 KOps/s | 6.8444 KOps/s | |
| test_tensor_to_bytestream_speed[untyped_storage] | 0.1110s | 0.1106s | 9.0401 Ops/s | 8.8305 Ops/s | |
| test_tensor_to_bytestream_speed[numpy] | 2.5594μs | 2.5511μs | 391.9899 KOps/s | 377.3300 KOps/s | |
| test_tensor_to_bytestream_speed[safetensors] | 36.3099μs | 36.1425μs | 27.6683 KOps/s | 26.7156 KOps/s | |
| test_simple | 0.5480s | 0.5465s | 1.8300 Ops/s | 1.7416 Ops/s | |
| test_transformed | 1.0877s | 1.0868s | 0.9202 Ops/s | 0.8907 Ops/s | |
| test_serial | 1.7047s | 1.7018s | 0.5876 Ops/s | 0.5780 Ops/s | |
| test_parallel | 1.0276s | 1.0219s | 0.9786 Ops/s | 0.9465 Ops/s | |
| test_step_mdp_speed[True-True-True-True-True] | 0.2767ms | 41.4369μs | 24.1331 KOps/s | 24.4066 KOps/s | |
| test_step_mdp_speed[True-True-True-True-False] | 60.5110μs | 22.9578μs | 43.5582 KOps/s | 43.3383 KOps/s | |
| test_step_mdp_speed[True-True-True-False-True] | 54.9210μs | 24.1120μs | 41.4731 KOps/s | 42.5462 KOps/s | |
| test_step_mdp_speed[True-True-True-False-False] | 47.4410μs | 12.9636μs | 77.1393 KOps/s | 77.6405 KOps/s | |
| test_step_mdp_speed[True-True-False-True-True] | 80.1710μs | 44.4392μs | 22.5027 KOps/s | 22.6066 KOps/s | |
| test_step_mdp_speed[True-True-False-True-False] | 66.2110μs | 25.7332μs | 38.8603 KOps/s | 38.3945 KOps/s | |
| test_step_mdp_speed[True-True-False-False-True] | 55.3910μs | 26.0921μs | 38.3258 KOps/s | 37.6782 KOps/s | |
| test_step_mdp_speed[True-True-False-False-False] | 48.9010μs | 15.4194μs | 64.8532 KOps/s | 63.8476 KOps/s | |
| test_step_mdp_speed[True-False-True-True-True] | 0.1254ms | 46.9472μs | 21.3005 KOps/s | 21.0892 KOps/s | |
| test_step_mdp_speed[True-False-True-True-False] | 64.0710μs | 28.2316μs | 35.4213 KOps/s | 35.2619 KOps/s | |
| test_step_mdp_speed[True-False-True-False-True] | 0.4400ms | 26.3321μs | 37.9765 KOps/s | 38.1030 KOps/s | |
| test_step_mdp_speed[True-False-True-False-False] | 0.4422ms | 15.3216μs | 65.2671 KOps/s | 63.6939 KOps/s | |
| test_step_mdp_speed[True-False-False-True-True] | 85.1510μs | 48.9653μs | 20.4226 KOps/s | 20.2747 KOps/s | |
| test_step_mdp_speed[True-False-False-True-False] | 0.4544ms | 30.6681μs | 32.6072 KOps/s | 32.0471 KOps/s | |
| test_step_mdp_speed[True-False-False-False-True] | 0.4797ms | 28.2644μs | 35.3802 KOps/s | 34.2575 KOps/s | |
| test_step_mdp_speed[True-False-False-False-False] | 0.4399ms | 17.8934μs | 55.8866 KOps/s | 55.2015 KOps/s | |
| test_step_mdp_speed[False-True-True-True-True] | 82.8110μs | 47.2995μs | 21.1419 KOps/s | 21.1282 KOps/s | |
| test_step_mdp_speed[False-True-True-True-False] | 0.4859ms | 27.9270μs | 35.8076 KOps/s | 35.3326 KOps/s | |
| test_step_mdp_speed[False-True-True-False-True] | 2.3599ms | 30.3195μs | 32.9821 KOps/s | 32.8665 KOps/s | |
| test_step_mdp_speed[False-True-True-False-False] | 51.9410μs | 17.0904μs | 58.5123 KOps/s | 58.3812 KOps/s | |
| test_step_mdp_speed[False-True-False-True-True] | 0.5042ms | 49.7670μs | 20.0936 KOps/s | 20.5774 KOps/s | |
| test_step_mdp_speed[False-True-False-True-False] | 0.4542ms | 30.9437μs | 32.3168 KOps/s | 32.3424 KOps/s | |
| test_step_mdp_speed[False-True-False-False-True] | 0.4657ms | 31.9049μs | 31.3432 KOps/s | 31.7038 KOps/s | |
| test_step_mdp_speed[False-True-False-False-False] | 47.7610μs | 19.5467μs | 51.1597 KOps/s | 51.3692 KOps/s | |
| test_step_mdp_speed[False-False-True-True-True] | 0.4878ms | 52.9565μs | 18.8834 KOps/s | 19.1776 KOps/s | |
| test_step_mdp_speed[False-False-True-True-False] | 0.4766ms | 33.2106μs | 30.1109 KOps/s | 29.6786 KOps/s | |
| test_step_mdp_speed[False-False-True-False-True] | 0.4592ms | 31.6724μs | 31.5732 KOps/s | 30.8979 KOps/s | |
| test_step_mdp_speed[False-False-True-False-False] | 47.0700μs | 19.4859μs | 51.3192 KOps/s | 50.6200 KOps/s | |
| test_step_mdp_speed[False-False-False-True-True] | 0.4813ms | 53.2620μs | 18.7751 KOps/s | 18.7571 KOps/s | |
| test_step_mdp_speed[False-False-False-True-False] | 0.4550ms | 35.5916μs | 28.0965 KOps/s | 27.7295 KOps/s | |
| test_step_mdp_speed[False-False-False-False-True] | 0.4573ms | 33.4630μs | 29.8837 KOps/s | 29.4384 KOps/s | |
| test_step_mdp_speed[False-False-False-False-False] | 53.1110μs | 22.0182μs | 45.4169 KOps/s | 44.8066 KOps/s | |
| test_non_tensor_env_rollout_speed[1000-single-True] | 0.7271s | 0.7218s | 1.3854 Ops/s | 1.3306 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-single-False] | 0.7093s | 0.6021s | 1.6607 Ops/s | 1.6293 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] | 1.7388s | 1.6483s | 0.6067 Ops/s | 0.6079 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] | 1.5108s | 1.4274s | 0.7006 Ops/s | 0.7000 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-True] | 1.9866s | 1.9028s | 0.5255 Ops/s | 0.5263 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-serial-buffers-False] | 1.7742s | 1.6831s | 0.5942 Ops/s | 0.5965 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] | 4.6946s | 4.6115s | 0.2169 Ops/s | 0.2171 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] | 4.6212s | 4.3860s | 0.2280 Ops/s | 0.2261 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] | 1.9662s | 1.8619s | 0.5371 Ops/s | 0.5298 Ops/s | |
| test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] | 1.8223s | 1.6428s | 0.6087 Ops/s | 0.6249 Ops/s | |
| test_values[generalized_advantage_estimate-True-True] | 10.1399ms | 9.9036ms | 100.9738 Ops/s | 99.0387 Ops/s | |
| test_values[vec_generalized_advantage_estimate-True-True] | 19.7717ms | 17.3217ms | 57.7312 Ops/s | 56.7647 Ops/s | |
| test_values[td0_return_estimate-False-False] | 0.2247ms | 0.1310ms | 7.6344 KOps/s | 7.6365 KOps/s | |
| test_values[td1_return_estimate-False-False] | 27.4345ms | 27.0658ms | 36.9469 Ops/s | 35.8437 Ops/s | |
| test_values[vec_td1_return_estimate-False-False] | 17.8404ms | 17.3914ms | 57.4997 Ops/s | 56.3738 Ops/s | |
| test_values[td_lambda_return_estimate-True-False] | 41.0136ms | 40.2998ms | 24.8140 Ops/s | 24.1669 Ops/s | |
| test_values[vec_td_lambda_return_estimate-True-False] | 18.6377ms | 17.3831ms | 57.5271 Ops/s | 56.9047 Ops/s | |
| test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.8322ms | 8.7444ms | 114.3584 Ops/s | 112.0286 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 1.7995ms | 1.5411ms | 648.8800 Ops/s | 658.7524 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5717ms | 0.4202ms | 2.3797 KOps/s | 2.3694 KOps/s | |
| test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 30.9745ms | 30.2361ms | 33.0731 Ops/s | 29.2513 Ops/s | |
| test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 1.9591ms | 1.7380ms | 575.3800 Ops/s | 571.8212 Ops/s | |
| test_dqn_speed[False-None] | 1.5132ms | 1.4254ms | 701.5362 Ops/s | 704.4545 Ops/s | |
| test_dqn_speed[False-backward] | 2.0117ms | 1.9631ms | 509.4105 Ops/s | 514.0191 Ops/s | |
| test_dqn_speed[True-None] | 1.5108ms | 0.5892ms | 1.6971 KOps/s | 1.7218 KOps/s | |
| test_dqn_speed[True-backward] | 1.0898ms | 1.0597ms | 943.6410 Ops/s | 933.6941 Ops/s | |
| test_dqn_speed[reduce-overhead-None] | 0.6525ms | 0.5625ms | 1.7779 KOps/s | 1.7652 KOps/s | |
| test_ddpg_speed[False-None] | 3.2581ms | 2.9217ms | 342.2625 Ops/s | 348.5446 Ops/s | |
| test_ddpg_speed[False-backward] | 4.3874ms | 4.1820ms | 239.1190 Ops/s | 242.2997 Ops/s | |
| test_ddpg_speed[True-None] | 1.6970ms | 1.4785ms | 676.3748 Ops/s | 658.2724 Ops/s | |
| test_ddpg_speed[True-backward] | 2.6774ms | 2.5465ms | 392.6923 Ops/s | 351.0471 Ops/s | |
| test_ddpg_speed[reduce-overhead-None] | 1.6139ms | 1.4696ms | 680.4757 Ops/s | 682.7995 Ops/s | |
| test_sac_speed[False-None] | 8.8179ms | 8.2455ms | 121.2785 Ops/s | 123.0920 Ops/s | |
| test_sac_speed[False-backward] | 12.2318ms | 11.6088ms | 86.1414 Ops/s | 87.0367 Ops/s | |
| test_sac_speed[True-None] | 2.5001ms | 2.2835ms | 437.9206 Ops/s | 430.9479 Ops/s | |
| test_sac_speed[True-backward] | 4.4618ms | 4.3055ms | 232.2586 Ops/s | 224.5339 Ops/s | |
| test_sac_speed[reduce-overhead-None] | 2.3847ms | 2.2734ms | 439.8750 Ops/s | 455.2801 Ops/s | |
| test_redq_speed[False-None] | 14.0315ms | 10.9985ms | 90.9213 Ops/s | 89.2160 Ops/s | |
| test_redq_speed[False-backward] | 19.8768ms | 18.8356ms | 53.0911 Ops/s | 54.3606 Ops/s | |
| test_redq_speed[True-None] | 5.1877ms | 4.8186ms | 207.5294 Ops/s | 204.2387 Ops/s | |
| test_redq_speed[reduce-overhead-None] | 5.0047ms | 4.7204ms | 211.8486 Ops/s | 208.1884 Ops/s | |
| test_redq_deprec_speed[False-None] | 12.1804ms | 11.6565ms | 85.7888 Ops/s | 87.3498 Ops/s | |
| test_redq_deprec_speed[False-backward] | 17.2985ms | 16.8107ms | 59.4861 Ops/s | 60.9095 Ops/s | |
| test_redq_deprec_speed[True-None] | 4.0532ms | 3.8426ms | 260.2392 Ops/s | 263.9104 Ops/s | |
| test_redq_deprec_speed[True-backward] | 8.1226ms | 7.8725ms | 127.0249 Ops/s | 123.9510 Ops/s | |
| test_redq_deprec_speed[reduce-overhead-None] | 3.9720ms | 3.7339ms | 267.8192 Ops/s | 265.2092 Ops/s | |
| test_td3_speed[False-None] | 8.3588ms | 8.2873ms | 120.6668 Ops/s | 121.4168 Ops/s | |
| test_td3_speed[False-backward] | 11.4930ms | 11.2102ms | 89.2044 Ops/s | 89.3368 Ops/s | |
| test_td3_speed[True-None] | 1.9784ms | 1.9308ms | 517.9262 Ops/s | 505.6019 Ops/s | |
| test_td3_speed[True-backward] | 4.0355ms | 3.8150ms | 262.1249 Ops/s | 216.5542 Ops/s | |
| test_td3_speed[reduce-overhead-None] | 1.9424ms | 1.8924ms | 528.4287 Ops/s | 526.6660 Ops/s | |
| test_cql_speed[False-None] | 30.7688ms | 27.5771ms | 36.2620 Ops/s | 37.0908 Ops/s | |
| test_cql_speed[False-backward] | 41.7916ms | 37.2787ms | 26.8250 Ops/s | 26.7935 Ops/s | |
| test_cql_speed[True-None] | 13.3051ms | 12.9644ms | 77.1342 Ops/s | 76.4616 Ops/s | |
| test_cql_speed[True-backward] | 19.4837ms | 19.1105ms | 52.3272 Ops/s | 51.6523 Ops/s | |
| test_cql_speed[reduce-overhead-None] | 16.0954ms | 13.1013ms | 76.3285 Ops/s | 68.9470 Ops/s | |
| test_a2c_speed[False-None] | 5.8228ms | 5.6079ms | 178.3185 Ops/s | 180.2399 Ops/s | |
| test_a2c_speed[False-backward] | 12.5522ms | 12.2502ms | 81.6314 Ops/s | 81.9740 Ops/s | |
| test_a2c_speed[True-None] | 4.3954ms | 3.9245ms | 254.8111 Ops/s | 247.6531 Ops/s | |
| test_a2c_speed[True-backward] | 9.3278ms | 9.0357ms | 110.6723 Ops/s | 107.3296 Ops/s | |
| test_a2c_speed[reduce-overhead-None] | 4.1160ms | 3.9281ms | 254.5736 Ops/s | 249.1491 Ops/s | |
| test_ppo_speed[False-None] | 6.2941ms | 6.1071ms | 163.7450 Ops/s | 166.1772 Ops/s | |
| test_ppo_speed[False-backward] | 13.2707ms | 12.9768ms | 77.0607 Ops/s | 77.2416 Ops/s | |
| test_ppo_speed[True-None] | 4.2258ms | 3.9499ms | 253.1739 Ops/s | 257.4699 Ops/s | |
| test_ppo_speed[True-backward] | 9.3109ms | 8.9068ms | 112.2737 Ops/s | 110.3913 Ops/s | |
| test_ppo_speed[reduce-overhead-None] | 4.3076ms | 3.9018ms | 256.2917 Ops/s | 256.6729 Ops/s | |
| test_reinforce_speed[False-None] | 5.1473ms | 4.8013ms | 208.2760 Ops/s | 208.9556 Ops/s | |
| test_reinforce_speed[False-backward] | 7.9667ms | 7.7402ms | 129.1954 Ops/s | 129.3866 Ops/s | |
| test_reinforce_speed[True-None] | 3.4667ms | 3.1146ms | 321.0712 Ops/s | 265.2638 Ops/s | |
| test_reinforce_speed[True-backward] | 8.5008ms | 8.2271ms | 121.5497 Ops/s | 117.1607 Ops/s | |
| test_reinforce_speed[reduce-overhead-None] | 3.4862ms | 3.0762ms | 325.0784 Ops/s | 319.8975 Ops/s | |
| test_iql_speed[False-None] | 26.3882ms | 21.3990ms | 46.7311 Ops/s | 47.8956 Ops/s | |
| test_iql_speed[False-backward] | 33.3263ms | 31.4325ms | 31.8142 Ops/s | 31.6477 Ops/s | |
| test_iql_speed[True-None] | 11.3674ms | 8.9374ms | 111.8896 Ops/s | 111.2392 Ops/s | |
| test_iql_speed[True-backward] | 17.7127ms | 17.1037ms | 58.4670 Ops/s | 56.7252 Ops/s | |
| test_iql_speed[reduce-overhead-None] | 9.1135ms | 8.8301ms | 113.2487 Ops/s | 112.6218 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 6.2141ms | 6.0352ms | 165.6938 Ops/s | 161.1386 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 3.0941ms | 0.3407ms | 2.9348 KOps/s | 3.0946 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5519ms | 0.2900ms | 3.4479 KOps/s | 3.5409 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.1041ms | 5.8821ms | 170.0069 Ops/s | 168.9455 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0676ms | 0.3545ms | 2.8205 KOps/s | 2.8262 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.5756ms | 0.3216ms | 3.1093 KOps/s | 2.9551 KOps/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6887ms | 1.4130ms | 707.7367 Ops/s | 680.1547 Ops/s | |
| test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.6230ms | 1.3067ms | 765.2746 Ops/s | 730.6633 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 9.8197ms | 6.0827ms | 164.3995 Ops/s | 163.1147 Ops/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.9353ms | 0.4798ms | 2.0840 KOps/s | 1.9357 KOps/s | |
| test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8391ms | 0.5474ms | 1.8268 KOps/s | 2.1320 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.9180ms | 5.8022ms | 172.3475 Ops/s | 166.8815 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 2.2799ms | 0.3677ms | 2.7196 KOps/s | 2.5240 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.5485ms | 0.3504ms | 2.8542 KOps/s | 3.5804 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 6.0030ms | 5.7879ms | 172.7749 Ops/s | 169.1829 Ops/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 2.2542ms | 0.3645ms | 2.7431 KOps/s | 2.9906 KOps/s | |
| test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6082ms | 0.3444ms | 2.9038 KOps/s | 2.8190 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 6.0953ms | 5.9702ms | 167.4987 Ops/s | 163.0425 Ops/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 2.0298ms | 0.5097ms | 1.9620 KOps/s | 2.1730 KOps/s | |
| test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7148ms | 0.4979ms | 2.0084 KOps/s | 2.2829 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 6.4805ms | 5.0682ms | 197.3090 Ops/s | 193.0917 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 9.8217ms | 2.1551ms | 464.0167 Ops/s | 448.1267 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 8.4718ms | 1.3851ms | 721.9862 Ops/s | 1.1085 KOps/s | |
| test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.6354s | 17.7107ms | 56.4630 Ops/s | 49.4588 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 11.4602ms | 2.0507ms | 487.6361 Ops/s | 493.2001 Ops/s | |
| test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 7.0584ms | 1.2370ms | 808.4148 Ops/s | 1.0505 KOps/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 6.9054ms | 5.2877ms | 189.1190 Ops/s | 185.2036 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 14.1564ms | 2.1366ms | 468.0318 Ops/s | 507.9810 Ops/s | |
| test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.3046ms | 1.0833ms | 923.0870 Ops/s | 882.6640 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] | 44.6008ms | 40.5000ms | 24.6914 Ops/s | 25.0072 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] | 20.5152ms | 19.0546ms | 52.4809 Ops/s | 54.6173 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] | 47.1971ms | 42.2458ms | 23.6710 Ops/s | 23.8792 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] | 20.9636ms | 19.2799ms | 51.8675 Ops/s | 53.0614 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] | 45.1395ms | 42.9148ms | 23.3020 Ops/s | 23.1557 Ops/s | |
| test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] | 0.5762s | 31.7178ms | 31.5281 Ops/s | 49.8005 Ops/s | |
| test_storage_write_lazystack[50-img_shape0-small] | 0.8576ms | 0.2307ms | 4.3345 KOps/s | 4.4663 KOps/s | |
| test_storage_write_lazystack[100-img_shape1-atari] | 1.5775ms | 1.4272ms | 700.6950 Ops/s | 689.8375 Ops/s | |
| test_storage_write_lazystack[100-img_shape2-large_img] | 2.8224ms | 2.3548ms | 424.6664 Ops/s | 415.3645 Ops/s | |
| test_storage_write_lazystack[200-img_shape3-large_batch] | 3.1258ms | 2.9968ms | 333.6936 Ops/s | 330.1909 Ops/s | |
| test_storage_write_contiguous[50-img_shape0-small] | 0.5266ms | 0.1400ms | 7.1449 KOps/s | 7.3776 KOps/s | |
| test_storage_write_contiguous[100-img_shape1-atari] | 0.3259ms | 0.1971ms | 5.0738 KOps/s | 5.3462 KOps/s | |
| test_storage_write_contiguous[100-img_shape2-large_img] | 2.1661ms | 1.7338ms | 576.7742 Ops/s | 560.1306 Ops/s | |
| test_storage_write_contiguous[200-img_shape3-large_batch] | 1.4536ms | 1.3227ms | 756.0267 Ops/s | 756.7658 Ops/s | |
| test_collector_stack_then_write[50-img_shape0-small] | 1.3251ms | 1.1533ms | 867.0542 Ops/s | 883.9678 Ops/s | |
| test_collector_stack_then_write[100-img_shape1-atari] | 3.7802ms | 3.6252ms | 275.8506 Ops/s | 277.5263 Ops/s | |
| test_collector_stack_then_write[100-img_shape2-large_img] | 11.1347ms | 5.7792ms | 173.0329 Ops/s | 170.0029 Ops/s | |
| test_collector_stack_then_write[200-img_shape3-large_batch] | 7.7757ms | 7.1322ms | 140.2086 Ops/s | 133.7130 Ops/s | |
| test_collector_lazystack_then_write[50-img_shape0-small] | 0.4274ms | 0.2827ms | 3.5379 KOps/s | 3.5881 KOps/s | |
| test_collector_lazystack_then_write[100-img_shape1-atari] | 1.6757ms | 1.5462ms | 646.7635 Ops/s | 634.2417 Ops/s | |
| test_collector_lazystack_then_write[100-img_shape2-large_img] | 2.9075ms | 2.4533ms | 407.6185 Ops/s | 396.5157 Ops/s | |
| test_collector_lazystack_then_write[200-img_shape3-large_batch] | 3.3753ms | 3.1916ms | 313.3262 Ops/s | 307.3383 Ops/s | |
| test_collector_without_rb[100-img_shape0-atari] | 34.1589ms | 32.9072ms | 30.3885 Ops/s | 30.7172 Ops/s | |
| test_collector_without_rb[200-img_shape1-large_batch] | 66.0794ms | 64.6596ms | 15.4656 Ops/s | 15.5404 Ops/s | |
| test_collector_with_rb[100-img_shape0-atari] | 38.6810ms | 37.3885ms | 26.7462 Ops/s | 26.6573 Ops/s | |
| test_collector_with_rb[200-img_shape1-large_batch] | 74.7194ms | 73.3297ms | 13.6370 Ops/s | 13.6285 Ops/s |
vmoens
commented
Mar 26, 2026
| def _step(self, tensordict: TensorDictBase) -> TensorDictBase: | ||
| # No need to clone here because inv does it already | ||
| # tensordict = tensordict.clone(False) | ||
| if self.base_env._trust_step_output: |
Collaborator
Author
There was a problem hiding this comment.
how do we guarantee that there are no partial steps and such? Isn't this a bit of a footgun?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
When _trust_step_output is True, EnvBase.step() skips _assert_tensordict_shape,
partial_steps handling, next_preset logic, and _step_proc_data. Similarly,
TransformedEnv._step() skips partial_steps, next_preset, and _complete_done.
This eliminates all per-step Python validation overhead for well-behaved envs.
Made-with: Cursor