[Newton] Migrate manager based workflow to warp by hujc7 · Pull Request #4480 · isaac-sim/IsaacLab

hujc7 · 2026-01-28T10:30:28Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (existing functionality will not work without user modification)
Documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

hujc7 · 2026-02-10T23:24:50Z

P1: /home/jichuanh/workspaces/isaac/data/manager-based/cp-torch-more-300.txt
P2: /home/jichuanh/workspaces/isaac/data/manager-based/cpv-G-ESRT.txt

Training result (last step)

iteration P1: 299/300; iteration P2: 299/300

metric	P1	P2	% change
Computation	302717 steps/s	266494 steps/s	-11.97%
Mean action noise std	0.06	0.09	+50.00%
Mean value_function loss	0.0000	0.0004	—
Mean surrogate loss	0.0062	0.0045	-27.42%
Mean entropy loss	-1.4284	-1.0168	-28.82%
Mean reward	4.94	4.93	-0.20%
Mean episode length	300.00	300.00	+0.00%
Episode_Reward/alive	1.0000	1.0000	+0.00%
Episode_Reward/terminating	0.0000	0.0000	—
Episode_Reward/pole_pos	-0.0084	-0.0108	+28.57%
Episode_Reward/cart_vel	-0.0035	-0.0042	+20.00%
Episode_Reward/pole_vel	-0.0011	-0.0014	+27.27%
Episode_Termination/time_out	0.9993	0.9976	-0.17%
Episode_Termination/cart_out_of_bounds	0.0007	0.0024	+242.86%
Total timesteps	19660800	19660800	+0.00%
Iteration time	0.22s	0.25s	+13.64%
Time elapsed	00:01:23	00:00:50	-39.76%

Step

gap P1: -4.72% (Σsub 7941.26 us/step vs timed 8334.90 us/step); gap P2: -13.46% (Σsub 1937.77 us/step vs timed 2239.05 us/step)

operation	P1 time/call (us)	P2 time/call (us)	% change (per-step)
Step (1)	8334.90	2239.05	-73.14%
Newton simulation step (2)	448.57	447.09	-0.33%
Scene.update (2)	160.93	175.88	+9.29%
Reset idx (1)	2993.70	342.93	-88.54%
Action preprocessing (1)	20.17	104.55	+418.40%
Reset selection (1)	61.61	92.86	+50.72%
ActionManager.apply_action (2)	424.85	31.63	-92.56%
Termination+Reward (1)	1314.45	39.25	-97.01%
ActionManager.process_action (1)	112.61	21.52	-80.89%
ObservationManager.compute (update_history) (1)	199.23	18.35	-90.79%
CommandManager.compute (1)	8.91	9.11	+2.21%
Scene.write_data_to_sim (2)	580.94	—	—

Reset idx

gap P1: -7.69% (Σsub 2763.60 us/step vs timed 2993.70 us/step); gap P2: -61.19% (Σsub 133.11 us/step vs timed 342.93 us/step)

operation	P1 time/call (us)	P2 time/call (us)	% change (per-step)
Reset idx (1)	2993.70	342.93	-88.54%
Action+Reward reset (1.0002)	471.28	26.06	-94.47%
EventManager.apply (reset) (1.0002)	1890.76	21.40	-98.87%
TerminationManager.reset (1.0002)	89.98	21.23	-76.40%
Scene.reset (1.0002)	252.66	17.34	-93.14%
ObservationManager.reset (1.0002)	10.92	9.42	-13.71%
CurriculumManager.compute (reset) (1.0002)	9.46	9.21	-2.63%
EventManager.reset (1.0002)	14.34	7.35	-48.72%
CurriculumManager.reset (1.0002)	9.08	7.28	-19.81%
CommandManager.reset (1.0002)	7.06	7.01	-0.70%
RecorderManager.reset (1.0002)	7.49	6.77	-9.58%

hujc7 · 2026-02-11T08:44:06Z

As timer itself has overhead and it also does a wp.sync, Data with only step timed

P1: /home/jichuanh/workspaces/isaac/data/manager-based/cpv-torch-baseline-less.txt
P2: /home/jichuanh/workspaces/isaac/data/manager-based/cpv-G-baseline-reset-step-off.txt

Training result (last step)

iteration P1: 299/300
iteration P2: 299/300

metric	P1	P2	% change
Computation	344628 steps/s	227807 steps/s	-33.90%
Mean action noise std	0.06	0.09	+50.00%
Mean value_function loss	0.0000	0.0004	—
Mean surrogate loss	0.0062	0.0045	-27.42%
Mean entropy loss	-1.4284	-1.0168	-28.82%
Mean reward	4.94	4.93	-0.20%
Mean episode length	300.00	300.00	+0.00%
Episode_Reward/alive	1.0000	1.0000	+0.00%
Episode_Reward/terminating	0.0000	0.0000	—
Episode_Reward/pole_pos	-0.0084	-0.0108	+28.57%
Episode_Reward/cart_vel	-0.0035	-0.0042	+20.00%
Episode_Reward/pole_vel	-0.0011	-0.0014	+27.27%
Episode_Termination/time_out	0.9993	0.9976	-0.17%
Episode_Termination/cart_out_of_bounds	0.0007	0.0024	+242.86%
Total timesteps	19660800	19660800	+0.00%
Iteration time	0.19s	0.29s	+52.63%
Time elapsed	00:01:14	00:00:53	-28.38%

Step

gap P1: -100.00% (Σsub 0.00 us/step vs timed 6974.04 us/step)
gap P2: -100.00% (Σsub 0.00 us/step vs timed 1431.46 us/step)

operation	P1 time/call (us)	P2 time/call (us)	% change (per-step)
Step (1)	6974.04	1431.46	-79.47%

hujc7 · 2026-02-13T10:54:47Z

covergence issue is fixed. However, with event manger migrated, there's still a slower to converge problem between mean episode length of 100-200, which is likely due to rng usage. Torch impl uses a single seed to sample events, but warp version has per env rng state, causing slightly different distribution over short period.

Below is the comparison of torch run vs captured warp run.
P1: data/regression-test/torch-baseline.txt
P2: data/regression-test/2026-02-13_02-32-13_single-manager-warp-sweep_Isaac-Cartpole-Warp-v0_ne4096_it300_rep1/00_default_only_2_r01.txt

Training result (last step)

iteration P1: 299/300
iteration P2: 299/300

metric	P1	P2	% change
Computation	319793 steps/s	591688 steps/s	+85.02%
Mean action noise std	0.07	0.09	+28.57%
Mean value_function loss	0.0000	0.0000	—
Mean surrogate loss	0.0060	0.0058	-3.33%
Mean entropy loss	-1.1987	-1.0135	-15.45%
Mean reward	4.93	4.94	+0.20%
Mean episode length	300.00	300.00	+0.00%
Episode_Reward/alive	1.0000	1.0000	+0.00%
Episode_Reward/terminating	0.0000	0.0000	—
Episode_Reward/pole_pos	-0.0089	-0.0099	+11.24%
Episode_Reward/cart_vel	-0.0037	-0.0037	-0.00%
Episode_Reward/pole_vel	-0.0012	-0.0013	+8.33%
Episode_Termination/time_out	0.9976	0.9983	+0.07%
Episode_Termination/cart_out_of_bounds	0.0024	0.0017	-29.17%
Total timesteps	19660800	19660800	+0.00%
Iteration time	0.20s	0.11s	-45.00%
Time elapsed	00:01:19	00:00:51	-35.44%

Step

gap P1: -100.00% (Σsub 0.00 us/step vs timed 8006.61 us/step)
gap P2: -100.00% (Σsub 0.00 us/step vs timed 1502.66 us/step)

operation	P1 time/call (us)	P2 time/call (us)	% change (per-step)
Step (1)	8006.61	1502.66	-81.23%

Training convergence

threshold	P1 first iter	P2 first iter	Δiter (P2-P1)
50	3	3	+0
100	6	17	+11
150	22	30	+8
200	27	35	+8
250	37	38	+1
300	48	49	+1

greptile-apps · 2026-02-18T06:13:15Z

Greptile Summary

This PR migrates the manager-based workflow to use Warp for CUDA-graph-friendly execution. The implementation adds a complete experimental Warp-first manager stack (isaaclab_experimental) with kernels for observations, rewards, actions, events, and terminations. A ManagerCallSwitch allows runtime selection between stable (torch), Warp non-captured, and Warp captured (CUDA graph) modes. The Newton articulation was refactored to support Warp-native mask resolution. A reference Cartpole task demonstrates the complete workflow.

Key Changes:

New manager_based_env_warp.py and manager_based_rl_env_warp.py with mode switching
Complete Warp-native manager implementations (action, observation, reward, termination, event, command)
Warp kernels for all MDP terms (observations, rewards, actions, events, terminations)
Newton articulation refactored with ArticulationData._resolve_1d_mask for Warp mask operations
Cartpole task as reference implementation
RL library wrappers updated for Warp array compatibility

Issues Found:

Timer decorators on step() methods hardcoded to enable=True in both stable and experimental envs (lines manager_based_rl_env.py:159 and manager_based_rl_env_warp.py:197), causing unconditional overhead even when section timers are disabled
SceneEntityCfg.resolve() assumes entity is always Articulation (line 39), will fail for RigidObject or other asset types when workflow is extended

Confidence Score: 4/5

Safe to merge as experimental feature with minor issues that don't affect core functionality
The implementation is well-structured and comprehensive, with proper CUDA graph support and mask-based operations. Two minor issues exist: (1) hardcoded timer overhead on step() methods and (2) SceneEntityCfg assumes Articulation type. These don't break current Cartpole usage but should be addressed before extending to other tasks. The scratch mask sharing in ArticulationData is documented and safe under CUDA stream ordering. Most previous review concerns have been resolved.
Pay attention to manager_based_rl_env.py and manager_based_rl_env_warp.py for the timer overhead issue, and scene_entity_cfg.py for the type assumption that will need fixing when extending beyond Articulation assets

Important Files Changed

Filename	Overview
source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_env_warp.py	New Warp-based base environment with manager call switch and CUDA graph support; resolve_env_mask correctly handles Python lists
source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_rl_env_warp.py	Warp-first RL environment extending base; Timer decorator hardcoded to enable=True on step() method
source/isaaclab/isaaclab/envs/manager_based_rl_env.py	Stable RL environment with added timing instrumentation; Timer decorator hardcoded to enable=True on step() method
source/isaaclab_experimental/isaaclab_experimental/managers/reward_manager.py	Warp-native reward computation with kernels for finalization and episode sum tracking; step_reward stores weighted rates
source/isaaclab_experimental/isaaclab_experimental/managers/action_manager.py	Warp-first action manager with mask-based reset operations; properly structured for CUDA graph capture
source/isaaclab_experimental/isaaclab_experimental/managers/observation_manager.py	Warp-native observation manager with pre-allocated buffers and in-place kernel execution; comprehensive implementation
source/isaaclab_experimental/isaaclab_experimental/managers/scene_entity_cfg.py	Scene entity configuration with Warp joint masks; resolve() assumes entity is always Articulation which will fail for other asset types
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/actions/joint_actions.py	Warp-native joint action processing with scale/offset/clip transformations; properly handles subset joint selection
source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation_data.py	Articulation data with Warp mask resolution utilities; _resolve_1d_mask uses shared scratch buffer (not re-entrant but safe under CUDA stream ordering)
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/manager_based/classic/cartpole/cartpole_env_cfg.py	Cartpole environment configuration with Warp-based workflow; properly structured config with single sim field definition

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ManagerBasedRLEnvWarp] --> B[ManagerCallSwitch]
    A --> C[Scene]
    A --> D[Manager Infrastructure]
    
    B --> B1[STABLE Mode]
    B --> B2[WARP_NOT_CAPTURED Mode]
    B --> B3[WARP_CAPTURED Mode]
    
    B3 --> B3A[CUDA Graph Capture]
    
    D --> D1[ActionManager]
    D --> D2[ObservationManager]
    D --> D3[RewardManager]
    D --> D4[TerminationManager]
    D --> D5[EventManager]
    D --> D6[CommandManager]
    
    D1 --> E1[JointAction Terms]
    E1 --> E1A[Warp Kernels]
    E1A --> E1B[process_joint_actions_kernel]
    
    D2 --> E2[Observation Terms]
    E2 --> E2A[Warp Kernels]
    E2A --> E2B[joint_pos_rel_gather]
    E2A --> E2C[joint_vel_rel_gather]
    
    D3 --> E3[Reward Terms]
    E3 --> E3A[Warp Kernels]
    E3A --> E3B[reward_finalize]
    E3A --> E3C[sum_and_zero_masked]
    
    D4 --> E4[Termination Terms]
    E4 --> E4A[Warp Boolean Masks]
    
    D5 --> E5[Event Terms]
    E5 --> E5A[interval_step_per_env]
    E5A --> E5B[RNG State Management]
    
    D6 --> E6[Command Terms]
    E6 --> E6A[resample_time_left]
    E6A --> E6B[RNG State Management]
    
    C --> F[Newton Articulation]
    F --> F1[ArticulationData]
    F1 --> F2[Warp Mask Resolution]
    F2 --> F3[resolve_joint_mask]
    F2 --> F4[resolve_body_mask]
    
    style B3 fill:#90EE90
    style E1A fill:#87CEEB
    style E2A fill:#87CEEB
    style E3A fill:#87CEEB
    style E4A fill:#87CEEB
    style E5A fill:#87CEEB
    style E6A fill:#87CEEB
    style F2 fill:#87CEEB

_{Last reviewed commit: a247e5d}

greptile-apps

_{51 files reviewed, 9 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-18T06:13:19Z

source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_env_warp.py

+        # ids provided as python sequence (already normalized to list above)
+        if len(env_ids) == 0:
+            return self.ENV_MASK
+        raise ValueError(f"Unsupported env_ids type: {type(env_ids)}")


resolve_env_mask fails for non-empty Python lists

When env_ids is a non-empty Python list (e.g. [0, 2, 5]), the code reaches line 490 and unconditionally raises ValueError. The empty-list check on line 488 is handled, but non-empty lists are never converted to a Warp array and fed to the kernel. This means any caller that passes a plain Python list of env ids (a common pattern) will crash.

You likely need to convert the Python list to a wp.array and launch the kernel, similar to the ArticulationData._resolve_1d_mask implementation:

Suggested change

# ids provided as python sequence (already normalized to list above)

if len(env_ids) == 0:

return self.ENV_MASK

raise ValueError(f"Unsupported env_ids type: {type(env_ids)}")

# ids provided as python sequence (already normalized to list above)

if len(env_ids) == 0:

return self.ENV_MASK

ids_wp = wp.array(env_ids, dtype=wp.int32, device=self.device)

wp.launch(

kernel=_generate_env_mask_from_ids_int32,

dim=ids_wp.shape[0],

inputs=[self.ENV_MASK, ids_wp],

device=self.device,

)

return self.ENV_MASK

Note: this allocates a temporary wp.array which is not CUDA-graph-capture friendly, but it matches the behavior of ArticulationData._resolve_1d_mask and is required for correctness on non-captured paths.

greptile-apps · 2026-02-18T06:13:20Z

..._experimental/isaaclab_tasks_experimental/manager_based/classic/cartpole/cartpole_env_cfg.py

+    sim: SimulationCfg = SimulationCfg(
+        newton_cfg=NewtonCfg(
+            solver_cfg=MJWarpSolverCfg(
+                nconmax=5,
+            ),
+        )
+    )
+
+    # Scene settings
+    scene: CartpoleSceneCfg = CartpoleSceneCfg(num_envs=4096, env_spacing=4.0, clone_in_fabric=True)
+    # Basic settings
+    observations: ObservationsCfg = ObservationsCfg()
+    actions: ActionsCfg = ActionsCfg()
+    events: EventCfg = EventCfg()
+    # MDP settings
+    rewards: RewardsCfg = RewardsCfg()
+    terminations: TerminationsCfg = TerminationsCfg()
+    # Simulation settings
+    sim: SimulationCfg = SimulationCfg(
+        newton_cfg=NewtonCfg(
+            solver_cfg=MJWarpSolverCfg(
+                njmax=5,
+                nconmax=3,
+                ls_iterations=10,
+                cone="pyramidal",
+                impratio=1,
+                ls_parallel=True,
+                integrator="implicit",
+            ),
+            num_substeps=1,
+            debug_mode=False,
+            use_cuda_graph=True,
+        )
+    )


Duplicate sim field — first definition is silently overridden

CartpoleEnvCfg defines sim: SimulationCfg twice (lines 164-170 and lines 182-197). Because Python @configclass/@dataclass semantics allow field redefinition, the first assignment is silently overridden by the second. This makes the code confusing and the first definition on line 164 is dead code. Consider removing the first sim definition to avoid confusion.

greptile-apps · 2026-02-18T06:13:21Z

source/isaaclab/isaaclab/envs/manager_based_rl_env.py

@@ -149,6 +156,7 @@ def setup_manager_visualizers(self):
    Operations - MDP
    """


Always-on timer decorator on step() in the stable env

The @Timer(name="env_step", ..., enable=True) decorator is hardcoded to enable=True, meaning every call to step() in the stable ManagerBasedRLEnv now incurs timer overhead unconditionally. The TIMER_ENABLED_STEP flag only controls the inner section timers. This should likely use enable=TIMER_ENABLED_STEP or be controlled by the same flag to avoid always-on overhead in production use.

Suggested change

"""

@Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us")

greptile-apps · 2026-02-18T06:13:22Z

source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_rl_env_warp.py

+        self.reset_terminated = self.termination_manager.terminated
+        self.reset_time_outs = self.termination_manager.time_outs
+
+    @Timer(name="env_step", msg="Step took:", enable=True, format="us")


Always-on timer decorator on step() in experimental env

Same as the stable env: the @Timer(name="env_step", ..., enable=True) decorator is hardcoded to enable=True. This means every call to step() incurs timer overhead unconditionally, even though the TIMER_ENABLED_STEP flag controls the inner section timers. Consider using enable=TIMER_ENABLED_STEP for consistency.

Suggested change

@Timer(name="env_step", msg="Step took:", enable=True, format="us")

@Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us")

greptile-apps · 2026-02-18T06:13:23Z

source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_env_warp.py

+    def _manager_name_from_stage(self, stage: str) -> str:
+        if "_" not in stage:
+            raise ValueError(f"Invalid stage '{stage}'. Expected '{{manager_name}}_{{function_name}}'.")
+        return stage.split("_", 1)[0]


Stage name parsing assumes single underscore delimiter

_manager_name_from_stage splits on the first _ to derive the manager name (e.g., "ActionManager_process_action" → "ActionManager"). However, Scene_write_data_to_sim and Scene_reset use "Scene" as the prefix, which is not in MANAGER_NAMES. This means get_mode_for_manager("Scene") will always fall through to the default mode. While this works by accident (the default is WARP_CAPTURED), it means the Scene stage can never be independently configured to a different mode.

This isn't a bug per se, but worth being aware of since it creates an implicit coupling between the Scene call mode and the default configuration.

greptile-apps · 2026-02-18T06:13:24Z

source/isaaclab_experimental/isaaclab_experimental/managers/reward_manager.py

+
+@wp.kernel
+def _reward_pre_compute_reset(
+    # output
+    reward_buf: wp.array(dtype=wp.float32),
+    step_reward: wp.array(dtype=wp.float32, ndim=2),
+    term_outs: wp.array(dtype=wp.float32, ndim=2),


_reward_finalize stores weighted reward rate but uses it as step reward

The comment on line 68 says "store weighted reward rate (matches old: value/dt)", but the code stores raw * weight without dividing by dt. Line 73 then multiplies by dt to compute the actual step reward contribution (val = weighted * dt). The step_reward array appears to store weighted-but-not-dt-scaled values, which is indeed a "rate" interpretation. However, this could be confusing for downstream consumers of _step_reward_wp (used in get_active_iterable_terms) — the logged values represent reward rates, not actual per-step rewards. Worth verifying this matches the expected semantics.

greptile-apps · 2026-02-18T06:13:25Z

source/isaaclab_experimental/isaaclab_experimental/managers/command_manager.py

+        s = rng_state[env_id]
+        time_left[env_id] = wp.randf(s, lower, upper)
+        rng_state[env_id] = s + wp.uint32(1)


RNG state increment uses addition instead of wp.randf

In _resample_time_left_and_increment_counter, the RNG state is advanced by simple addition (rng_state[env_id] = s + wp.uint32(1)), unlike other kernels in this PR (e.g., in event_manager.py and noise_model.py) that advance the state implicitly through wp.randf / wp.randn calls. After wp.randf on line 75, the state s has already been advanced internally by Warp. Writing s + 1 back effectively skips one RNG output. While this won't cause a crash, it may produce slightly different random streams compared to the pattern used elsewhere in the codebase.

Suggested change

s = rng_state[env_id]

time_left[env_id] = wp.randf(s, lower, upper)

rng_state[env_id] = s + wp.uint32(1)

rng_state[env_id] = s

greptile-apps · 2026-02-18T06:13:26Z

source/isaaclab_experimental/isaaclab_experimental/managers/scene_entity_cfg.py

+    def resolve(self, scene: InteractiveScene):
+        # run the stable resolution first (fills joint_ids/body_ids from names/regex)
+        super().resolve(scene)
+
+        # Build a Warp joint mask for articulations.
+        entity: Articulation = scene[self.name]
+
+        # Pre-allocate a full-length mask (all True for default selection).
+        if self.joint_ids == slice(None):
+            joint_ids_list = list(range(entity.num_joints))
+            mask_list = [True] * entity.num_joints
+        else:
+            joint_ids_list = list(self.joint_ids)
+            mask_list = [False] * entity.num_joints
+            for idx in joint_ids_list:
+                mask_list[idx] = True
+        self.joint_mask = wp.array(mask_list, dtype=wp.bool, device=scene.device)
+        self.joint_ids_wp = wp.array(joint_ids_list, dtype=wp.int32, device=scene.device)


resolve() assumes the entity is always an Articulation

SceneEntityCfg.resolve() unconditionally casts scene[self.name] to Articulation (line 39) and accesses entity.num_joints. If the entity is a different type (e.g., a RigidObject), this will fail with an AttributeError. The base class SceneEntityCfg supports non-articulation entities, so this override should either check the entity type or guard against it.

Since this is currently only used for the Cartpole task (which is always an Articulation), this isn't immediately broken, but it will be a problem when the experimental workflow is extended to other asset types.

greptile-apps · 2026-02-18T06:13:27Z

source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation_data.py

+        mask: wp.array | torch.Tensor | None,
+        all_mask: wp.array,
+        scratch_mask: wp.array,
+    ) -> wp.array:
+        """Resolve ids/mask into a warp boolean mask.
+
+        Notes:
+        - Returns ``all_mask`` when both ids and mask are None (or ids is slice(None)).
+        - If ids are provided and mask is None, this populates ``scratch_mask`` in-place using Warp kernels.
+        - Torch inputs are supported for compatibility, but are generally not CUDA-graph-capture friendly.
+        """
+        # Fast path: explicit mask provided.
+        if mask is not None:
+            if isinstance(mask, torch.Tensor):
+                # Ensure boolean + correct device, then wrap for Warp.
+                if mask.dtype != torch.bool:
+                    mask = mask.to(dtype=torch.bool)
+                if str(mask.device) != self.device:
+                    mask = mask.to(self.device)
+                return wp.from_torch(mask, dtype=wp.bool)
+            return mask
+
+        # Fast path: ids == all / not specified.
+        if ids is None:
+            return all_mask
+        if isinstance(ids, slice) and ids == slice(None):
+            return all_mask
+
+        # Normalize ids into a 1D wp.int32 array.
+        if isinstance(ids, slice):
+            # Convert to explicit indices (supports partial slices).
+            # We infer the valid range from the scratch mask length.
+            start, stop, step = ids.indices(scratch_mask.shape[0])
+            ids = list(range(start, stop, step))
+
+        if isinstance(ids, wp.array):
+            ids_wp = ids
+        elif isinstance(ids, torch.Tensor):
+            if ids.dtype != torch.int32:
+                ids = ids.to(dtype=torch.int32)
+            if str(ids.device) != self.device:
+                ids = ids.to(self.device)
+            ids_wp = wp.from_torch(ids, dtype=wp.int32)
+        else:
+            ids_list = list(ids)
+            ids_wp = wp.array(ids_list, dtype=wp.int32, device=self.device) if len(ids_list) > 0 else None
+
+        # Populate scratch mask.
+        scratch_mask.fill_(False)
+        if ids_wp is not None:
+            wp.launch(
+                kernel=generate_mask_from_ids,
+                dim=ids_wp.shape[0],
+                inputs=[scratch_mask, ids_wp],
+                device=self.device,


Scratch mask is shared — concurrent callers will race

_resolve_1d_mask populates a shared scratch buffer (scratch_mask) in-place. If multiple callers resolve masks concurrently (e.g., resolve_joint_mask for two different action terms in the same step, since both use JOINT_MASK as the scratch), the second call will overwrite the first caller's mask before it's consumed.

This is acknowledged in manager_based_env_warp.py with the comment "Not thread-safe / re-entrant", but in the articulation data there's no such warning. The Newton articulation now uses resolve_joint_mask in many places (all the set_joint_* methods). If two action terms call set_joint_effort_target with different joint subsets in sequence, the first term's resolved joint mask could be overwritten by the second term before the first kernel launch consumes it. This is safe only if Warp kernel launches are queued (CUDA stream ordering), but worth documenting or verifying.

Introduces ManagerBasedEnvWarp and ManagerBasedRLEnvWarp with ManagerCallSwitch for per-manager stable/warp/captured mode selection. Includes Newton articulation data extensions, RL wrapper adaptations, and train script integration.

Warp-first implementations of all 7 managers (action, observation, reward, termination, event, command, recorder) with mask-based reset for CUDA graph compatibility. Includes MDP term library (observations, rewards, terminations, events, actions), IO descriptors, and utility modules (noise, modifiers, circular buffers, warp kernels).

Cartpole env config for the warp manager-based RL env, registered as Isaac-Cartpole-Warp-v0. Includes warp-first custom reward term (joint_pos_target_l2).

hujc7 · 2026-02-20T10:15:28Z

@greptileai review updated.

greptile-apps

_{52 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

hujc7 · 2026-02-23T08:35:48Z

@greptileai can you do a full review without considering previous comments? Those should have been all addressed.

greptile-apps

_{52 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-23T08:40:23Z

source/isaaclab/isaaclab/envs/manager_based_rl_env.py

    Operations - MDP
    """

+    @Timer(name="env_step", msg="Step took:", enable=True, format="us")


Hardcoded enable=True means this timer always runs, even when TIMER_ENABLED_STEP=False. Consider changing to enable=TIMER_ENABLED_STEP for consistency with the inner section timers.

greptile-apps · 2026-02-23T08:40:24Z

source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_rl_env_warp.py

+        self.reset_terminated = self.termination_manager.terminated
+        self.reset_time_outs = self.termination_manager.time_outs
+
+    @Timer(name="env_step", msg="Step took:", enable=True, format="us")


Same as stable env: hardcoded enable=True causes unconditional timer overhead. Consider using enable=TIMER_ENABLED_STEP to match the inner section timers.

greptile-apps · 2026-02-23T08:40:25Z

source/isaaclab_experimental/isaaclab_experimental/managers/scene_entity_cfg.py

+        # Build a Warp joint mask for articulations only.
+        entity = scene[self.name]
+        if not isinstance(entity, BaseArticulation):
+            return


Type check needed before accessing entity.num_joints. When workflow extends beyond Cartpole to other asset types (e.g., RigidObject), this will fail with AttributeError. Add isinstance(entity, BaseArticulation) check or handle gracefully.

kellyguo11 · 2026-02-26T00:58:14Z

source/isaaclab/isaaclab/envs/manager_based_rl_env.py

        """
        # process actions
-        self.action_manager.process_action(action.to(self.device))
+        with Timer(


maybe we can remove these annotations so that they don't introduce additional overhead

They do have quite some overhead if on, but should be no overhead if off. Nice to keep until migration is done to explore per manager further improvements. But it's also something that can be kept as a patch and not merged.

kellyguo11 · 2026-02-26T00:58:28Z

scripts/reinforcement_learning/rsl_rl/train.py

    help="Enable Isaac Lab timers (use --no-timer to disable).",
 )
+parser.add_argument(
+    "--manager_call_config",


what does this flag do?

It switches manager and calls to 3 modes, torch, warp and warp capture, which also affect which manager to import.
From CLI perspective, it's a way to debug any potential manager problems, which is nice to keep until migration is done.
From env perspective, it provides way to downgrade not-capturable managers calls to warp. But this usage probably does not need CLI argument.

kellyguo11 · 2026-02-26T01:00:54Z

source/isaaclab/isaaclab/envs/manager_based_rl_env.py

 from .manager_based_env import ManagerBasedEnv
 from .manager_based_rl_env_cfg import ManagerBasedRLEnvCfg

+# Controls per-section timing inside `step()`.


changes in this file look all related to capturing timings. I think if we are ready to merge this, we can revert these changes

kellyguo11 · 2026-02-26T01:07:09Z

source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation.py

-        # solve for None masks
-        if env_mask is None:
-            env_mask = self._data.ALL_ENV_MASK
+        env_mask = self._data.resolve_env_mask(env_ids=env_ids, env_mask=env_mask)


@AntoineRichard could you take a look at these articulation changes?

kellyguo11 · 2026-02-26T01:08:25Z

source/isaaclab/isaaclab/managers/manager_base.py

+        # TODO(jichuanh): This might not be reasonable here. Revisit
+        # Materialize default SceneEntityCfg kwargs (e.g. asset_cfg=SceneEntityCfg("robot")) into params.
+        # Otherwise, defaults live only in the callable signature and never get resolved/cached by the manager.
+        signature = inspect.signature(func_static)


hmm I'm not too sure about the logic in this part. maybe @ooctipus could give this a quick look?

The problem

Many MDP term functions declare a SceneEntityCfg as a keyword argument default:

def reset_root_state_uniform(
env, env_mask, pose_range, velocity_range,
asset_cfg: SceneEntityCfg = SceneEntityCfg("robot"), # default kwarg
):

When a user's env config doesn't explicitly pass asset_cfg in term_cfg.params, the key "asset_cfg" never appears in
term_cfg.params at all — it only exists as a default in the function signature.

Why that's a problem

_process_term_cfg_at_play (line 384) iterates over term_cfg.params.items() and calls value.resolve(self._env.scene) on
every SceneEntityCfg it finds. This resolution step:

Looks up the asset in the scene by name ("robot")

Resolves string joint_names / body_names into integer joint_ids / body_ids

Caches the resolved ids on the SceneEntityCfg object

If the SceneEntityCfg lives only as a function signature default and is never in term_cfg.params, it never gets resolved.
At call time, Python would supply the unresolved default object — with joint_ids=None, body_ids=None, no scene lookup done.

What the materialization does

Lines 356-362 inspect the function signature, find any parameter whose default is a SceneEntityCfg, and copy it into
term_cfg.params (if not already there). This makes it visible to _process_term_cfg_at_play, which then resolves it like any
explicitly-provided SceneEntityCfg.

The .copy() on line 362 is important — without it, every term instance sharing the same function would mutate the same
default object (the classic Python mutable-default-argument pitfall).

This is more of warp needed things as the mask need to be pre-resolved from None into wp array,

github-actions bot added the isaac-lab Related to Isaac Lab team label Jan 28, 2026

hujc7 changed the title ~~[Newton] Migrate manager based workflow to warp~~ [WIP][Newton] Migrate manager based workflow to warp Jan 28, 2026

hujc7 force-pushed the dev-newton-warp-mig-manager-based branch from d4796f8 to ef0e0f6 Compare January 28, 2026 10:32

hujc7 force-pushed the dev-newton-warp-mig-manager-based branch from 247d950 to 108a707 Compare February 10, 2026 09:23

hujc7 marked this pull request as ready for review February 18, 2026 06:06

hujc7 requested review from Mayankm96 and kellyguo11 as code owners February 18, 2026 06:06

greptile-apps bot reviewed Feb 18, 2026

View reviewed changes

hujc7 force-pushed the dev-newton-warp-mig-manager-based branch from 7a309b8 to 94ce315 Compare February 18, 2026 06:19

hujc7 added 5 commits February 20, 2026 01:36

Add warp manager-based RL env infrastructure

74675fc

Introduces ManagerBasedEnvWarp and ManagerBasedRLEnvWarp with ManagerCallSwitch for per-manager stable/warp/captured mode selection. Includes Newton articulation data extensions, RL wrapper adaptations, and train script integration.

Add warp Cartpole task configuration

f107b29

Cartpole env config for the warp manager-based RL env, registered as Isaac-Cartpole-Warp-v0. Includes warp-first custom reward term (joint_pos_target_l2).

Address greptile comments

bc08182

Adapted to latest dev/newton

a247e5d

hujc7 force-pushed the dev-newton-warp-mig-manager-based branch from 94ce315 to a247e5d Compare February 20, 2026 10:12

hujc7 requested a review from hhansen-bdai as a code owner February 20, 2026 10:12

greptile-apps bot reviewed Feb 20, 2026

View reviewed changes

hujc7 changed the title ~~[WIP][Newton] Migrate manager based workflow to warp~~ [Newton] Migrate manager based workflow to warp Feb 23, 2026

greptile-apps bot reviewed Feb 23, 2026

View reviewed changes

hujc7 mentioned this pull request Feb 23, 2026

[Newton] Migrate more envs and mdps to warp #4690

Merged

5 tasks

kellyguo11 reviewed Feb 26, 2026

View reviewed changes

kellyguo11 merged commit 0363cbc into isaac-sim:dev/newton Feb 26, 2026
5 of 11 checks passed

hujc7 mentioned this pull request Mar 5, 2026

[Exp] Cherry-pick manager-based warp env infrastructure from dev/newton #4829

Open

1 task

		@@ -149,6 +156,7 @@ def setup_manager_visualizers(self):
		Operations - MDP
		"""

	"""
	@Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us")

Conversation

hujc7 commented Jan 28, 2026

Description

Type of change

Screenshots

Checklist

Uh oh!

hujc7 commented Feb 10, 2026

Training result (last step)

Step

Reset idx

Uh oh!

hujc7 commented Feb 11, 2026

Training result (last step)

Step

Uh oh!

hujc7 commented Feb 13, 2026

Training result (last step)

Step

Training convergence

Uh oh!

greptile-apps bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

hujc7 commented Feb 20, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

hujc7 commented Feb 23, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

greptile-apps bot commented Feb 18, 2026 •

edited

Loading