Skip to content

[Test] Add tests and benchmarks for collector throughput optimizations#3567

Closed
vmoens wants to merge 2 commits intogh/vmoens/248/basefrom
gh/vmoens/248/head
Closed

[Test] Add tests and benchmarks for collector throughput optimizations#3567
vmoens wants to merge 2 commits intogh/vmoens/248/basefrom
gh/vmoens/248/head

Conversation

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 24, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3567

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 4 New Failures, 1 Cancelled Job, 7 Pending, 2 Unrelated Failures

As of commit a0057af with merge base a4301ee (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Mar 24, 2026
Cover all 7 performance features: _skip_maybe_reset, _StepMDP out= reuse,
_trust_step_output, update_traj_ids, combined optimization flags,
torch.compile fullgraph, and fast-path benchmarks.

Made-with: Cursor
ghstack-source-id: ad18afe
Pull-Request: #3567
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 24, 2026
@github-actions github-actions bot added Tests Incomplete or broken unit tests Benchmarks rl/benchmark changes Collectors labels Mar 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 24, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 174. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}23$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 86.6211μs 85.8682μs 11.6458 KOps/s 12.3487 KOps/s $\textbf{\color{#d91a1a}-5.69\%}$
test_tensor_to_bytestream_speed[torch.save] 0.1500ms 0.1488ms 6.7190 KOps/s 7.0529 KOps/s $\color{#d91a1a}-4.73\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1059s 0.1054s 9.4863 Ops/s 9.2432 Ops/s $\color{#35bf28}+2.63\%$
test_tensor_to_bytestream_speed[numpy] 2.5983μs 2.5906μs 386.0180 KOps/s 381.8893 KOps/s $\color{#35bf28}+1.08\%$
test_tensor_to_bytestream_speed[safetensors] 39.5416μs 39.3272μs 25.4277 KOps/s 27.1560 KOps/s $\textbf{\color{#d91a1a}-6.36\%}$
test_simple 0.6831s 0.5788s 1.7277 Ops/s 1.7472 Ops/s $\color{#d91a1a}-1.11\%$
test_transformed 1.1387s 1.1107s 0.9003 Ops/s 0.8936 Ops/s $\color{#35bf28}+0.75\%$
test_serial 1.7775s 1.7451s 0.5730 Ops/s 0.5827 Ops/s $\color{#d91a1a}-1.67\%$
test_parallel 1.0259s 1.0242s 0.9763 Ops/s 0.9531 Ops/s $\color{#35bf28}+2.43\%$
test_step_mdp_speed[True-True-True-True-True] 0.1792ms 42.2874μs 23.6477 KOps/s 22.2946 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_step_mdp_speed[True-True-True-True-False] 0.4704ms 23.5894μs 42.3919 KOps/s 42.5008 KOps/s $\color{#d91a1a}-0.26\%$
test_step_mdp_speed[True-True-True-False-True] 58.3810μs 23.9804μs 41.7007 KOps/s 38.5091 KOps/s $\textbf{\color{#35bf28}+8.29\%}$
test_step_mdp_speed[True-True-True-False-False] 38.1110μs 13.0316μs 76.7367 KOps/s 77.0612 KOps/s $\color{#d91a1a}-0.42\%$
test_step_mdp_speed[True-True-False-True-True] 0.4837ms 46.1219μs 21.6817 KOps/s 21.5106 KOps/s $\color{#35bf28}+0.80\%$
test_step_mdp_speed[True-True-False-True-False] 0.4563ms 26.0529μs 38.3835 KOps/s 37.2270 KOps/s $\color{#35bf28}+3.11\%$
test_step_mdp_speed[True-True-False-False-True] 0.4598ms 27.2867μs 36.6478 KOps/s 34.6826 KOps/s $\textbf{\color{#35bf28}+5.67\%}$
test_step_mdp_speed[True-True-False-False-False] 40.9210μs 17.4017μs 57.4658 KOps/s 64.1690 KOps/s $\textbf{\color{#d91a1a}-10.45\%}$
test_step_mdp_speed[True-False-True-True-True] 0.4942ms 53.8039μs 18.5860 KOps/s 19.6844 KOps/s $\textbf{\color{#d91a1a}-5.58\%}$
test_step_mdp_speed[True-False-True-True-False] 0.4569ms 28.8030μs 34.7186 KOps/s 34.7619 KOps/s $\color{#d91a1a}-0.12\%$
test_step_mdp_speed[True-False-True-False-True] 0.4548ms 27.1313μs 36.8578 KOps/s 34.9365 KOps/s $\textbf{\color{#35bf28}+5.50\%}$
test_step_mdp_speed[True-False-True-False-False] 38.8900μs 15.7352μs 63.5517 KOps/s 64.5837 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[True-False-False-True-True] 0.5008ms 53.0556μs 18.8482 KOps/s 18.7668 KOps/s $\color{#35bf28}+0.43\%$
test_step_mdp_speed[True-False-False-True-False] 0.4709ms 31.2202μs 32.0305 KOps/s 32.0299 KOps/s $+0.00\%$
test_step_mdp_speed[True-False-False-False-True] 0.4624ms 29.7458μs 33.6182 KOps/s 32.1390 KOps/s $\color{#35bf28}+4.60\%$
test_step_mdp_speed[True-False-False-False-False] 64.7810μs 19.5456μs 51.1624 KOps/s 54.9365 KOps/s $\textbf{\color{#d91a1a}-6.87\%}$
test_step_mdp_speed[False-True-True-True-True] 0.4892ms 52.1460μs 19.1769 KOps/s 19.9260 KOps/s $\color{#d91a1a}-3.76\%$
test_step_mdp_speed[False-True-True-True-False] 0.4584ms 28.8672μs 34.6414 KOps/s 34.5123 KOps/s $\color{#35bf28}+0.37\%$
test_step_mdp_speed[False-True-True-False-True] 2.4873ms 34.4387μs 29.0371 KOps/s 31.3198 KOps/s $\textbf{\color{#d91a1a}-7.29\%}$
test_step_mdp_speed[False-True-True-False-False] 0.4650ms 19.0521μs 52.4875 KOps/s 56.6964 KOps/s $\textbf{\color{#d91a1a}-7.42\%}$
test_step_mdp_speed[False-True-False-True-True] 0.4960ms 56.4040μs 17.7292 KOps/s 19.1059 KOps/s $\textbf{\color{#d91a1a}-7.21\%}$
test_step_mdp_speed[False-True-False-True-False] 0.4657ms 34.0087μs 29.4042 KOps/s 32.1995 KOps/s $\textbf{\color{#d91a1a}-8.68\%}$
test_step_mdp_speed[False-True-False-False-True] 62.0810μs 33.1202μs 30.1931 KOps/s 28.9274 KOps/s $\color{#35bf28}+4.38\%$
test_step_mdp_speed[False-True-False-False-False] 0.4411ms 20.5144μs 48.7463 KOps/s 49.7545 KOps/s $\color{#d91a1a}-2.03\%$
test_step_mdp_speed[False-False-True-True-True] 0.4830ms 52.9284μs 18.8934 KOps/s 18.9699 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[False-False-True-True-False] 0.4575ms 34.1469μs 29.2852 KOps/s 29.3540 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[False-False-True-False-True] 65.9500μs 35.2289μs 28.3858 KOps/s 28.5764 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[False-False-True-False-False] 37.8010μs 20.0692μs 49.8275 KOps/s 51.0772 KOps/s $\color{#d91a1a}-2.45\%$
test_step_mdp_speed[False-False-False-True-True] 0.4805ms 55.0603μs 18.1619 KOps/s 17.7774 KOps/s $\color{#35bf28}+2.16\%$
test_step_mdp_speed[False-False-False-True-False] 0.4591ms 36.8609μs 27.1290 KOps/s 27.1220 KOps/s $\color{#35bf28}+0.03\%$
test_step_mdp_speed[False-False-False-False-True] 0.4587ms 35.5362μs 28.1403 KOps/s 27.5939 KOps/s $\color{#35bf28}+1.98\%$
test_step_mdp_speed[False-False-False-False-False] 61.6510μs 22.3868μs 44.6693 KOps/s 44.3833 KOps/s $\color{#35bf28}+0.64\%$
test_step_and_maybe_reset_fast_path 87.2053ms 85.5260ms 11.6924 Ops/s 11.1366 Ops/s $\color{#35bf28}+4.99\%$
test_step_and_maybe_reset_normal 0.1055s 0.1040s 9.6171 Ops/s 9.2418 Ops/s $\color{#35bf28}+4.06\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8852s 0.7564s 1.3220 Ops/s 1.2677 Ops/s $\color{#35bf28}+4.28\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7156s 0.6157s 1.6243 Ops/s 1.5627 Ops/s $\color{#35bf28}+3.94\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7460s 1.6576s 0.6033 Ops/s 0.5878 Ops/s $\color{#35bf28}+2.64\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5209s 1.4403s 0.6943 Ops/s 0.6854 Ops/s $\color{#35bf28}+1.30\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9921s 1.9222s 0.5202 Ops/s 0.5088 Ops/s $\color{#35bf28}+2.25\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7632s 1.6978s 0.5890 Ops/s 0.5819 Ops/s $\color{#35bf28}+1.22\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7691s 4.6268s 0.2161 Ops/s 0.2142 Ops/s $\color{#35bf28}+0.90\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5681s 4.4202s 0.2262 Ops/s 0.2260 Ops/s $\color{#35bf28}+0.11\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0086s 1.8879s 0.5297 Ops/s 0.5264 Ops/s $\color{#35bf28}+0.62\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6949s 1.5888s 0.6294 Ops/s 0.6116 Ops/s $\color{#35bf28}+2.91\%$
test_values[generalized_advantage_estimate-True-True] 10.1863ms 9.9884ms 100.1166 Ops/s 100.5964 Ops/s $\color{#d91a1a}-0.48\%$
test_values[vec_generalized_advantage_estimate-True-True] 17.3515ms 11.5535ms 86.5538 Ops/s 56.6109 Ops/s $\textbf{\color{#35bf28}+52.89\%}$
test_values[td0_return_estimate-False-False] 0.2182ms 0.1212ms 8.2508 KOps/s 7.6031 KOps/s $\textbf{\color{#35bf28}+8.52\%}$
test_values[td1_return_estimate-False-False] 27.6347ms 27.3484ms 36.5653 Ops/s 36.4154 Ops/s $\color{#35bf28}+0.41\%$
test_values[vec_td1_return_estimate-False-False] 17.6942ms 11.5123ms 86.8635 Ops/s 55.9695 Ops/s $\textbf{\color{#35bf28}+55.20\%}$
test_values[td_lambda_return_estimate-True-False] 42.5242ms 41.0299ms 24.3725 Ops/s 24.5521 Ops/s $\color{#d91a1a}-0.73\%$
test_values[vec_td_lambda_return_estimate-True-False] 12.1213ms 11.3408ms 88.1768 Ops/s 56.9782 Ops/s $\textbf{\color{#35bf28}+54.76\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.9484ms 8.8605ms 112.8602 Ops/s 113.5722 Ops/s $\color{#d91a1a}-0.63\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7175ms 1.5266ms 655.0491 Ops/s 636.1047 Ops/s $\color{#35bf28}+2.98\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6644ms 0.4311ms 2.3199 KOps/s 2.3791 KOps/s $\color{#d91a1a}-2.49\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 30.6135ms 30.1927ms 33.1206 Ops/s 33.5434 Ops/s $\color{#d91a1a}-1.26\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.0519ms 1.7682ms 565.5609 Ops/s 565.7122 Ops/s $\color{#d91a1a}-0.03\%$
test_dqn_speed[False-None] 1.8849ms 1.4354ms 696.6464 Ops/s 706.7841 Ops/s $\color{#d91a1a}-1.43\%$
test_dqn_speed[False-backward] 2.0481ms 1.9631ms 509.4055 Ops/s 513.1541 Ops/s $\color{#d91a1a}-0.73\%$
test_dqn_speed[True-None] 1.0366ms 0.6114ms 1.6356 KOps/s 1.6841 KOps/s $\color{#d91a1a}-2.88\%$
test_dqn_speed[True-backward] 1.1332ms 1.0945ms 913.6411 Ops/s 807.5767 Ops/s $\textbf{\color{#35bf28}+13.13\%}$
test_dqn_speed[reduce-overhead-None] 0.6858ms 0.5672ms 1.7631 KOps/s 1.7103 KOps/s $\color{#35bf28}+3.09\%$
test_ddpg_speed[False-None] 3.4359ms 2.9756ms 336.0687 Ops/s 349.0732 Ops/s $\color{#d91a1a}-3.73\%$
test_ddpg_speed[False-backward] 4.4814ms 4.2178ms 237.0925 Ops/s 242.4038 Ops/s $\color{#d91a1a}-2.19\%$
test_ddpg_speed[True-None] 1.9035ms 1.5192ms 658.2353 Ops/s 665.3558 Ops/s $\color{#d91a1a}-1.07\%$
test_ddpg_speed[True-backward] 2.7025ms 2.6248ms 380.9760 Ops/s 389.5716 Ops/s $\color{#d91a1a}-2.21\%$
test_ddpg_speed[reduce-overhead-None] 1.8884ms 1.4776ms 676.7546 Ops/s 671.3256 Ops/s $\color{#35bf28}+0.81\%$
test_sac_speed[False-None] 9.0785ms 8.4401ms 118.4815 Ops/s 121.6751 Ops/s $\color{#d91a1a}-2.62\%$
test_sac_speed[False-backward] 12.0358ms 11.6000ms 86.2073 Ops/s 86.6254 Ops/s $\color{#d91a1a}-0.48\%$
test_sac_speed[True-None] 2.5015ms 2.3199ms 431.0490 Ops/s 427.6579 Ops/s $\color{#35bf28}+0.79\%$
test_sac_speed[True-backward] 4.5901ms 4.3314ms 230.8722 Ops/s 224.3369 Ops/s $\color{#35bf28}+2.91\%$
test_sac_speed[reduce-overhead-None] 2.6798ms 2.2920ms 436.3029 Ops/s 415.3176 Ops/s $\textbf{\color{#35bf28}+5.05\%}$
test_redq_speed[False-None] 11.6382ms 11.0698ms 90.3357 Ops/s 88.0689 Ops/s $\color{#35bf28}+2.57\%$
test_redq_speed[False-backward] 24.4354ms 19.2989ms 51.8165 Ops/s 51.6256 Ops/s $\color{#35bf28}+0.37\%$
test_redq_speed[True-None] 5.1094ms 4.8116ms 207.8312 Ops/s 202.3591 Ops/s $\color{#35bf28}+2.70\%$
test_redq_speed[reduce-overhead-None] 5.1579ms 4.7992ms 208.3681 Ops/s 200.2564 Ops/s $\color{#35bf28}+4.05\%$
test_redq_deprec_speed[False-None] 12.4341ms 11.8440ms 84.4308 Ops/s 83.8649 Ops/s $\color{#35bf28}+0.67\%$
test_redq_deprec_speed[False-backward] 17.4868ms 17.0314ms 58.7149 Ops/s 58.0616 Ops/s $\color{#35bf28}+1.13\%$
test_redq_deprec_speed[True-None] 5.8529ms 3.9397ms 253.8276 Ops/s 256.9865 Ops/s $\color{#d91a1a}-1.23\%$
test_redq_deprec_speed[True-backward] 8.0857ms 7.8711ms 127.0475 Ops/s 122.1201 Ops/s $\color{#35bf28}+4.03\%$
test_redq_deprec_speed[reduce-overhead-None] 4.5094ms 3.8035ms 262.9173 Ops/s 260.4179 Ops/s $\color{#35bf28}+0.96\%$
test_td3_speed[False-None] 8.6848ms 8.5158ms 117.4282 Ops/s 120.3483 Ops/s $\color{#d91a1a}-2.43\%$
test_td3_speed[False-backward] 12.1670ms 11.6529ms 85.8157 Ops/s 88.9187 Ops/s $\color{#d91a1a}-3.49\%$
test_td3_speed[True-None] 1.9859ms 1.9347ms 516.8712 Ops/s 509.6013 Ops/s $\color{#35bf28}+1.43\%$
test_td3_speed[True-backward] 3.9024ms 3.7923ms 263.6947 Ops/s 262.2540 Ops/s $\color{#35bf28}+0.55\%$
test_td3_speed[reduce-overhead-None] 1.9790ms 1.9244ms 519.6316 Ops/s 514.9424 Ops/s $\color{#35bf28}+0.91\%$
test_cql_speed[False-None] 30.6804ms 27.7116ms 36.0860 Ops/s 36.2968 Ops/s $\color{#d91a1a}-0.58\%$
test_cql_speed[False-backward] 41.5533ms 37.3846ms 26.7490 Ops/s 26.4464 Ops/s $\color{#35bf28}+1.14\%$
test_cql_speed[True-None] 13.5411ms 13.2166ms 75.6623 Ops/s 74.4393 Ops/s $\color{#35bf28}+1.64\%$
test_cql_speed[True-backward] 19.5632ms 19.1666ms 52.1742 Ops/s 53.0975 Ops/s $\color{#d91a1a}-1.74\%$
test_cql_speed[reduce-overhead-None] 13.6588ms 13.2505ms 75.4688 Ops/s 75.9628 Ops/s $\color{#d91a1a}-0.65\%$
test_a2c_speed[False-None] 6.0828ms 5.6367ms 177.4083 Ops/s 178.9845 Ops/s $\color{#d91a1a}-0.88\%$
test_a2c_speed[False-backward] 12.6311ms 12.2792ms 81.4387 Ops/s 81.8763 Ops/s $\color{#d91a1a}-0.53\%$
test_a2c_speed[True-None] 4.1672ms 3.9812ms 251.1798 Ops/s 248.2146 Ops/s $\color{#35bf28}+1.19\%$
test_a2c_speed[True-backward] 9.9660ms 9.1811ms 108.9190 Ops/s 97.3199 Ops/s $\textbf{\color{#35bf28}+11.92\%}$
test_a2c_speed[reduce-overhead-None] 4.4114ms 3.9730ms 251.6992 Ops/s 240.1809 Ops/s $\color{#35bf28}+4.80\%$
test_ppo_speed[False-None] 6.2476ms 6.0256ms 165.9587 Ops/s 158.5933 Ops/s $\color{#35bf28}+4.64\%$
test_ppo_speed[False-backward] 13.3869ms 12.9722ms 77.0880 Ops/s 73.5798 Ops/s $\color{#35bf28}+4.77\%$
test_ppo_speed[True-None] 4.4249ms 3.9997ms 250.0211 Ops/s 243.2014 Ops/s $\color{#35bf28}+2.80\%$
test_ppo_speed[True-backward] 9.4834ms 9.1034ms 109.8494 Ops/s 106.2088 Ops/s $\color{#35bf28}+3.43\%$
test_ppo_speed[reduce-overhead-None] 4.3745ms 3.9644ms 252.2436 Ops/s 245.3303 Ops/s $\color{#35bf28}+2.82\%$
test_reinforce_speed[False-None] 5.1657ms 4.7772ms 209.3295 Ops/s 203.7487 Ops/s $\color{#35bf28}+2.74\%$
test_reinforce_speed[False-backward] 8.1751ms 7.7967ms 128.2591 Ops/s 123.8025 Ops/s $\color{#35bf28}+3.60\%$
test_reinforce_speed[True-None] 4.2903ms 3.2173ms 310.8227 Ops/s 307.6721 Ops/s $\color{#35bf28}+1.02\%$
test_reinforce_speed[True-backward] 8.9160ms 8.3154ms 120.2581 Ops/s 116.7623 Ops/s $\color{#35bf28}+2.99\%$
test_reinforce_speed[reduce-overhead-None] 3.5401ms 3.1345ms 319.0334 Ops/s 310.4296 Ops/s $\color{#35bf28}+2.77\%$
test_iql_speed[False-None] 21.7200ms 21.0026ms 47.6130 Ops/s 46.2171 Ops/s $\color{#35bf28}+3.02\%$
test_iql_speed[False-backward] 32.4082ms 31.6462ms 31.5994 Ops/s 30.4714 Ops/s $\color{#35bf28}+3.70\%$
test_iql_speed[True-None] 9.4829ms 8.9217ms 112.0869 Ops/s 107.1568 Ops/s $\color{#35bf28}+4.60\%$
test_iql_speed[True-backward] 17.6834ms 17.4167ms 57.4160 Ops/s 55.6643 Ops/s $\color{#35bf28}+3.15\%$
test_iql_speed[reduce-overhead-None] 9.1151ms 8.9332ms 111.9426 Ops/s 108.9115 Ops/s $\color{#35bf28}+2.78\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2525ms 6.0741ms 164.6346 Ops/s 163.4351 Ops/s $\color{#35bf28}+0.73\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.1232ms 0.3823ms 2.6157 KOps/s 2.5430 KOps/s $\color{#35bf28}+2.86\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7440ms 0.3739ms 2.6745 KOps/s 2.8467 KOps/s $\textbf{\color{#d91a1a}-6.05\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1165ms 5.8354ms 171.3675 Ops/s 169.0924 Ops/s $\color{#35bf28}+1.35\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.4514ms 0.3783ms 2.6434 KOps/s 3.2637 KOps/s $\textbf{\color{#d91a1a}-19.00\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6235ms 0.3655ms 2.7363 KOps/s 3.4727 KOps/s $\textbf{\color{#d91a1a}-21.20\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.9455ms 1.4628ms 683.6374 Ops/s 723.6081 Ops/s $\textbf{\color{#d91a1a}-5.52\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6266ms 1.3893ms 719.7705 Ops/s 773.6588 Ops/s $\textbf{\color{#d91a1a}-6.97\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.2110ms 6.1115ms 163.6264 Ops/s 164.4860 Ops/s $\color{#d91a1a}-0.52\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0537ms 0.5351ms 1.8687 KOps/s 2.0956 KOps/s $\textbf{\color{#d91a1a}-10.83\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7287ms 0.5169ms 1.9346 KOps/s 2.1943 KOps/s $\textbf{\color{#d91a1a}-11.84\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0105ms 5.8569ms 170.7399 Ops/s 169.4254 Ops/s $\color{#35bf28}+0.78\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8035ms 0.3899ms 2.5650 KOps/s 2.8449 KOps/s $\textbf{\color{#d91a1a}-9.84\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5735ms 0.3608ms 2.7719 KOps/s 2.5767 KOps/s $\textbf{\color{#35bf28}+7.57\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1111ms 5.7643ms 173.4804 Ops/s 170.3112 Ops/s $\color{#35bf28}+1.86\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.2335ms 0.3425ms 2.9196 KOps/s 2.7959 KOps/s $\color{#35bf28}+4.42\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5447ms 0.3038ms 3.2912 KOps/s 3.0294 KOps/s $\textbf{\color{#35bf28}+8.64\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1402ms 5.9560ms 167.8976 Ops/s 166.4414 Ops/s $\color{#35bf28}+0.87\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3244ms 0.4880ms 2.0492 KOps/s 1.7467 KOps/s $\textbf{\color{#35bf28}+17.32\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7095ms 0.4888ms 2.0457 KOps/s 1.8374 KOps/s $\textbf{\color{#35bf28}+11.34\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4164ms 5.0628ms 197.5182 Ops/s 50.1798 Ops/s $\textbf{\color{#35bf28}+293.62\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.1552ms 2.0414ms 489.8642 Ops/s 542.7833 Ops/s $\textbf{\color{#d91a1a}-9.75\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.1145ms 0.9713ms 1.0295 KOps/s 1.0111 KOps/s $\color{#35bf28}+1.83\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6564s 18.1894ms 54.9770 Ops/s 194.5181 Ops/s $\textbf{\color{#d91a1a}-71.74\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 11.7122ms 2.1125ms 473.3724 Ops/s 551.9697 Ops/s $\textbf{\color{#d91a1a}-14.24\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.2433ms 1.2793ms 781.6475 Ops/s 1.0381 KOps/s $\textbf{\color{#d91a1a}-24.70\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.8187ms 5.2796ms 189.4097 Ops/s 186.6272 Ops/s $\color{#35bf28}+1.49\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 13.9771ms 2.1561ms 463.8028 Ops/s 498.8118 Ops/s $\textbf{\color{#d91a1a}-7.02\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.4906ms 1.1272ms 887.1611 Ops/s 858.5860 Ops/s $\color{#35bf28}+3.33\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 44.1692ms 39.9477ms 25.0328 Ops/s 25.0094 Ops/s $\color{#35bf28}+0.09\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 20.7880ms 19.1797ms 52.1385 Ops/s 53.0712 Ops/s $\color{#d91a1a}-1.76\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 45.5397ms 41.5970ms 24.0402 Ops/s 24.1533 Ops/s $\color{#d91a1a}-0.47\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 21.4341ms 19.5720ms 51.0934 Ops/s 53.5111 Ops/s $\color{#d91a1a}-4.52\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 44.9037ms 42.9147ms 23.3020 Ops/s 22.7193 Ops/s $\color{#35bf28}+2.56\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 22.8237ms 20.8926ms 47.8639 Ops/s 49.0439 Ops/s $\color{#d91a1a}-2.41\%$
test_storage_write_lazystack[50-img_shape0-small] 0.9728ms 0.2365ms 4.2279 KOps/s 4.3367 KOps/s $\color{#d91a1a}-2.51\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6322ms 1.4521ms 688.6729 Ops/s 721.6422 Ops/s $\color{#d91a1a}-4.57\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.5377ms 2.3263ms 429.8641 Ops/s 415.9828 Ops/s $\color{#35bf28}+3.34\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1109ms 2.9398ms 340.1595 Ops/s 338.0660 Ops/s $\color{#35bf28}+0.62\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2448ms 0.1384ms 7.2229 KOps/s 7.3192 KOps/s $\color{#d91a1a}-1.32\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3427ms 0.1885ms 5.3048 KOps/s 4.9226 KOps/s $\textbf{\color{#35bf28}+7.76\%}$
test_storage_write_contiguous[100-img_shape2-large_img] 1.8835ms 1.7721ms 564.2887 Ops/s 571.6501 Ops/s $\color{#d91a1a}-1.29\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5047ms 1.3109ms 762.8141 Ops/s 775.7378 Ops/s $\color{#d91a1a}-1.67\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3332ms 1.1352ms 880.8721 Ops/s 876.7077 Ops/s $\color{#35bf28}+0.48\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.5751ms 3.6708ms 272.4222 Ops/s 279.8430 Ops/s $\color{#d91a1a}-2.65\%$
test_collector_stack_then_write[100-img_shape2-large_img] 11.2430ms 5.6947ms 175.6014 Ops/s 176.5123 Ops/s $\color{#d91a1a}-0.52\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 12.2660ms 7.4050ms 135.0435 Ops/s 141.9846 Ops/s $\color{#d91a1a}-4.89\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4739ms 0.2923ms 3.4214 KOps/s 3.6067 KOps/s $\textbf{\color{#d91a1a}-5.14\%}$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7368ms 1.5517ms 644.4394 Ops/s 668.1758 Ops/s $\color{#d91a1a}-3.55\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.6125ms 2.4695ms 404.9404 Ops/s 397.4790 Ops/s $\color{#35bf28}+1.88\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4299ms 3.1648ms 315.9774 Ops/s 317.0195 Ops/s $\color{#d91a1a}-0.33\%$
test_collector_without_rb[100-img_shape0-atari] 33.1833ms 32.6696ms 30.6095 Ops/s 30.7480 Ops/s $\color{#d91a1a}-0.45\%$
test_collector_without_rb[200-img_shape1-large_batch] 67.1202ms 65.4248ms 15.2847 Ops/s 15.4015 Ops/s $\color{#d91a1a}-0.76\%$
test_collector_with_rb[100-img_shape0-atari] 39.7728ms 37.9393ms 26.3579 Ops/s 26.8934 Ops/s $\color{#d91a1a}-1.99\%$
test_collector_with_rb[200-img_shape1-large_batch] 88.2179ms 75.1779ms 13.3018 Ops/s 13.7069 Ops/s $\color{#d91a1a}-2.96\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Apr 11, 2026
Cover all 7 performance features: _skip_maybe_reset, _StepMDP out= reuse,
_trust_step_output, update_traj_ids, combined optimization flags,
torch.compile fullgraph, and fast-path benchmarks.

Made-with: Cursor
ghstack-source-id: ffe15ef
Pull-Request: #3567
@vmoens vmoens closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Benchmarks rl/benchmark changes CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Tests Incomplete or broken unit tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant