Skip to content

[Exp] Cherry-pick manager-based warp env infrastructure from dev/newton#4829

Open
hujc7 wants to merge 3 commits intoisaac-sim:developfrom
hujc7:develop-manager-warp-cp
Open

[Exp] Cherry-pick manager-based warp env infrastructure from dev/newton#4829
hujc7 wants to merge 3 commits intoisaac-sim:developfrom
hujc7:develop-manager-warp-cp

Conversation

@hujc7
Copy link

@hujc7 hujc7 commented Mar 5, 2026

Summary

Cherry-pick of warp manager-based env infrastructure from dev/newton, refactored for develop.

isaaclab_experimental

  • Added warp-compatible manager implementations (ActionManager, ObservationManager, EventManager,
    CommandManager, TerminationManager, RewardManager) with Warp kernel execution and CUDA graph
    capture support.
  • Added ManagerCallSwitch utility for per-manager eager/captured dispatch, configured via
    MANAGER_CALL_CONFIG env var.
  • Added ManagerBasedEnvWarp and ManagerBasedRLEnvWarp orchestration env classes.
  • Added warp MDP terms (observations, rewards, terminations, events, joint actions).
  • Added utility modules: buffers (circular buffer), modifiers, noise models, warp kernels/helpers.
  • Added experimental SceneEntityCfg with warp joint mask/ids for kernel-level joint selection.
  • Generalized configclass default materialization in ManagerBase for automatic SceneEntityCfg resolution.

isaaclab_tasks_experimental

  • Added Isaac-Cartpole-Warp-v0 task as reference environment for warp manager-based workflow.

isaaclab_rl

  • Updated rsl_rl, rl_games, sb3, skrl wrappers to accept ManagerBasedRLEnvWarp and DirectRLEnvWarp.

isaaclab

  • Fixed SettingsManager to catch RuntimeError when carb is unavailable.
  • Minor comment cleanup in ObservationManager.

Notes

Known limitations

  • Scene_write_data_to_sim capped to mode=1 (eager) via MAX_MODE_OVERRIDES — articulation
    _apply_actuator_model uses wp.to_torch + torch indexing, not CUDA graph capture-safe.

Test plan

  • Isaac-Cartpole-Warp-v0 training (4096 envs, 300 iters, mode=2): converges (reward 4.95, ep_len 300)

@github-actions github-actions bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Mar 5, 2026
@hujc7
Copy link
Author

hujc7 commented Mar 5, 2026

@greptile review

@hujc7 hujc7 changed the title Cherry-pick manager-based warp env infrastructure from dev/newton [Experimental] Cherry-pick manager-based warp env infrastructure from dev/newton Mar 5, 2026
@hujc7
Copy link
Author

hujc7 commented Mar 5, 2026

WIP but put in review for bot rebiew.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR cherry-picks the Warp manager-based environment infrastructure from dev/newton into develop, adding a full experimental stack of CUDA-graph-friendly manager implementations (ActionManager, ObservationManager, EventManager, TerminationManager, RewardManager, CommandManager), orchestration classes (ManagerBasedEnvWarp, ManagerBasedRLEnvWarp), Warp MDP terms, and supporting utilities (ManagerCallSwitch, WarpGraphCache, CircularBuffer, WarpCapturable). The reference task Isaac-Cartpole-Warp-v0 is included and reported to converge (reward 4.95, ep_len 300 over 4096 envs, 300 iters, mode=2).

Key observations from this review:

  • Previous review issues are addressed: _load_cfg MAX_MODE_OVERRIDES is now applied on the default-config path; stable_call=None guard is in place; WarpGraphCache now captures-and-immediately-replays on first call; events.py and terminations.py asserts have been replaced with explicit raise; TIMER_ENABLED_STEP is respected on the outer step() decorator; the resolve_asset_cfg dead code was removed.
  • Remaining minor inconsistency: EventManager._apply_interval (lines 331, 337) and _apply_reset (line 370) still contain assert statements in hot-path dispatch methods — the same pattern fixed everywhere else in this PR. These are internal invariants that are always True under normal construction order, but they'd be silently skipped under Python -O and would produce an opaque TypeError/AttributeError on refactoring rather than a clear AssertionError. This is a style issue and should be addressed for consistency.
  • The shared-scratch-mask latent bug (ENV_MASK / _scratch_term_mask_wp) is acknowledged and tracked, with no current code path triggering it.
  • The Scene_write_data_to_sim mode=1 cap via MAX_MODE_OVERRIDES is clearly documented and correctly enforced.
  • RL library wrappers (rsl_rl, rl_games, sb3, skrl) are extended consistently to accept the new Warp env types.

Confidence Score: 4/5

  • Safe to merge as experimental infrastructure; the one remaining style inconsistency (asserts in EventManager hot paths) is non-blocking and doesn't affect the tested Cartpole task.
  • Score reflects a large, well-structured experimental cherry-pick where nearly all previously identified issues have been resolved. The remaining item is a minor style inconsistency (assert vs explicit raise) in EventManager hot-path methods. The reference task converges and the core CUDA graph capture infrastructure is sound.
  • source/isaaclab_experimental/isaaclab_experimental/managers/event_manager.py — _apply_interval and _apply_reset still have assert guards inconsistent with the rest of this PR.

Important Files Changed

Filename Overview
source/isaaclab_experimental/isaaclab_experimental/utils/manager_call_switch.py Core dispatch utility routing manager stages through STABLE/WARP_NOT_CAPTURED/WARP_CAPTURED modes. MAX_MODE_OVERRIDES now correctly applied on all config paths. Stable-call guard and validation are properly in place. Logic looks correct.
source/isaaclab_experimental/isaaclab_experimental/utils/warp_graph_cache.py WarpGraphCache captures on first call and immediately replays, caching the return value. First-call silent-skip bug (previous thread) is resolved. Logic is clean and correct.
source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_env_warp.py Base Warp environment. MAX_MODE_OVERRIDES path is fixed. Shared ENV_MASK scratch-mask latent bug is acknowledged and tracked. resolve_env_mask, rng_state_wp initialization, and _apply_manager_term_cfg_profile logic all look sound.
source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_rl_env_warp.py RL Warp environment. Outer step() timer now correctly uses TIMER_ENABLED_STEP. _action_in_wp guard uses RuntimeError. step() pipeline (process_action → decimation loop → termination → reward → reset → obs) matches documented behavior.
source/isaaclab_experimental/isaaclab_experimental/managers/event_manager.py Warp-first EventManager with interval/reset mask-based dispatch. Several assert statements remain in hot-path _apply_interval and _apply_reset methods that should be explicit raises for consistency with the rest of this PR.
source/isaaclab_experimental/isaaclab_experimental/managers/termination_manager.py Warp-first TerminationManager. assert statements replaced with explicit ValueError raises. Per-term view construction via column-stride pointer arithmetic is correct. Capture-safe pre-compute reset and finalize kernels look correct.
source/isaaclab_experimental/isaaclab_experimental/managers/reward_manager.py Warp-first RewardManager. Row-stride per-term output view construction is correct. _reward_pre_compute_reset, _reward_finalize, and _sum_and_zero_masked kernels look correct including their 2D index conventions.
source/isaaclab_experimental/isaaclab_experimental/managers/observation_manager.py Warp-first ObservationManager with pre-allocated output buffers, scale/clip Warp kernels, and CircularBuffer history. Logic is complex but appears correct for the Cartpole reference task.
source/isaaclab_experimental/isaaclab_experimental/managers/manager_base.py Experimental ManagerBase adding configclass-default materialization for automatic SceneEntityCfg resolution. Logic matches stable counterpart with expected warp-mask-based additions.
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/events.py Warp-first reset_joints_by_offset/scale event terms. assert statements replaced with explicit ValueError/AttributeError. Direct Warp buffer writes for joint state are correct.
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/terminations.py Warp-first termination terms. assert statements replaced with explicit ValueError. joint_pos_out_of_manual_limit correctly uses joint_mask and validates dimensions.
source/isaaclab_experimental/isaaclab_experimental/utils/warp/utils.py resolve_1d_mask (ids/mask → wp.bool array), wrap_to_pi warp func, and WarpCapturable decorator. resolve_asset_cfg dead code removed. Logic is correct.
source/isaaclab_experimental/isaaclab_experimental/managers/scene_entity_cfg.py Experimental SceneEntityCfg subclass adding joint_mask (bool) and joint_ids_wp (int32) Warp arrays. resolve() correctly delegates to stable super().resolve() first, then builds Warp arrays.
source/isaaclab/isaaclab/app/settings_manager.py Minimal stable fix: adds RuntimeError to the carb ImportError/AttributeError catch tuple so SettingsManager degrades gracefully when carb raises at import time.
source/isaaclab_rl/isaaclab_rl/rsl_rl/vecenv_wrapper.py Adds DirectRLEnvWarp and ManagerBasedRLEnvWarp to allowed_types tuple via try/except ImportError at runtime. TYPE_CHECKING contextlib.suppress pattern is acknowledged as low-priority by maintainer.

Sequence Diagram

sequenceDiagram
    participant RL as RL Library
    participant RLEnv as ManagerBasedRLEnvWarp
    participant MCS as ManagerCallSwitch
    participant WGC as WarpGraphCache
    participant AM as ActionManager
    participant Scene as InteractiveSceneWarp
    participant Sim as SimulationContext
    participant TM as TerminationManager
    participant ReM as RewardManager
    participant EM as EventManager
    participant OM as ObservationManager

    RL->>RLEnv: step(action)
    RLEnv->>RLEnv: copy action → _action_in_wp (stable ptr)
    RLEnv->>MCS: call_stage("ActionManager_process_action")
    MCS->>WGC: capture_or_replay (mode=2)
    WGC->>AM: process_action(_action_in_wp)

    loop cfg.decimation
        RLEnv->>MCS: call_stage("ActionManager_apply_action")
        MCS->>WGC: capture_or_replay
        WGC->>AM: apply_action()
        RLEnv->>MCS: call_stage("Scene_write_data_to_sim") [capped mode=1]
        MCS->>Scene: write_data_to_sim() [eager, not captured]
        RLEnv->>Sim: step()
        RLEnv->>Scene: update(dt)
    end

    RLEnv->>RLEnv: episode_length_buf += 1
    RLEnv->>MCS: call_stage("TerminationManager_compute")
    MCS->>WGC: capture_or_replay
    WGC->>TM: compute() → dones_wp

    RLEnv->>MCS: call_stage("RewardManager_compute")
    MCS->>WGC: capture_or_replay
    WGC->>ReM: compute(dt) → reward_buf

    RLEnv->>RLEnv: reset_env_ids = reset_buf.nonzero()
    opt len(reset_env_ids) > 0
        RLEnv->>RLEnv: _reset_idx(env_ids, env_mask)
        RLEnv->>MCS: call_stage("Scene_reset")
        RLEnv->>MCS: call_stage("EventManager_apply_reset")
        RLEnv->>MCS: call_stage("ObservationManager_reset")
        RLEnv->>MCS: call_stage("TerminationManager_reset")
    end

    RLEnv->>MCS: call_stage("EventManager_apply_interval")
    MCS->>WGC: capture_or_replay
    WGC->>EM: apply(mode="interval")

    RLEnv->>MCS: call_stage("ObservationManager_compute_update_history")
    MCS->>WGC: capture_or_replay
    WGC->>OM: compute(update_history=True) → obs_buf

    RLEnv-->>RL: (obs_buf, reward_buf, terminated, time_outs, extras)
Loading

Last reviewed commit: 224a902

Comment on lines +158 to +162
def _load_cfg(self, cfg_source: str | None) -> dict[str, int]:
if cfg_source is not None and not isinstance(cfg_source, str):
raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}")
if cfg_source is None or cfg_source.strip() == "":
return dict(self.DEFAULT_CONFIG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_MODE_OVERRIDES not applied for default config.

When cfg_source is None (the common case without --manager_call_config), _load_cfg returns early at line 162 without applying MAX_MODE_OVERRIDES. The PR description and class comment both state that Scene_write_data_to_sim must be capped at WARP_NOT_CAPTURED (mode 1) because _apply_actuator_model is not CUDA-graph-capture-safe. With the default config {"default": 2}, get_mode_for_manager("Scene") will return 2, bypassing the cap entirely and allowing unsafe CUDA graph capture.

The fix is to apply MAX_MODE_OVERRIDES on the default-config path:

Suggested change
def _load_cfg(self, cfg_source: str | None) -> dict[str, int]:
if cfg_source is not None and not isinstance(cfg_source, str):
raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}")
if cfg_source is None or cfg_source.strip() == "":
return dict(self.DEFAULT_CONFIG)
def _load_cfg(self, cfg_source: str | None) -> dict[str, int]:
if cfg_source is not None and not isinstance(cfg_source, str):
raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}")
if cfg_source is None or cfg_source.strip() == "":
cfg = dict(self.DEFAULT_CONFIG)
# Apply caps even for the default config
default_mode = cfg[self.DEFAULT_KEY]
for name, max_mode in self.MAX_MODE_OVERRIDES.items():
resolved = cfg.get(name, default_mode)
if resolved > max_mode:
cfg[name] = max_mode
return cfg

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Restructured _load_cfg so both paths (default config and parsed env var) flow into the shared MAX_MODE_OVERRIDES application at the end.

Comment on lines 160 to 161
@Timer(name="env_step", msg="Step took:", enable=True, format="us")
def step(self, action: torch.Tensor) -> VecEnvStepReturn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Timer(enable=True) hardcoded to always-on for stable step().

The decorator is hardcoded to enable=True, meaning it will print timing output to stdout on every environment step during training. The module-level constant TIMER_ENABLED_STEP = False was clearly introduced to gate timing output globally, but this decorator bypasses it. All inner context-manager timers correctly respect TIMER_ENABLED_STEP (lines 184, 191, 209, etc.), making this inconsistency appear unintentional.

This will generate extremely verbose and undesirable output for all users of the stable environment.

Suggested change
@Timer(name="env_step", msg="Step took:", enable=True, format="us")
def step(self, action: torch.Tensor) -> VecEnvStepReturn:
@Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us")
def step(self, action: torch.Tensor) -> VecEnvStepReturn:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in scope — this comment is about the stable env (source/isaaclab/isaaclab/envs/manager_based_rl_env.py), which is not modified in this PR.

@hujc7 hujc7 force-pushed the develop-manager-warp-cp branch 3 times, most recently from 6d1ac95 to 7138023 Compare March 9, 2026 08:26
@hujc7
Copy link
Author

hujc7 commented Mar 9, 2026

@greptileai Review

self.reset_terminated = self.termination_manager.terminated
self.reset_time_outs = self.termination_manager.time_outs

@Timer(name="env_step", msg="Step took:", enable=True, format="us")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outer step() timer always enabled, ignores TIMER_ENABLED_STEP

The @Timer decorator on step() has enable=True hardcoded, which means the outer step timer prints "Step took: …" to stdout on every environment step during training — even when neither DEBUG_TIMER_STEP nor DEBUG_TIMERS env vars are set.

The module-level constant TIMER_ENABLED_STEP was specifically introduced to gate this output (DEBUG_TIMER_STEP = os.environ.get("DEBUG_TIMER_STEP", "0") == "1"), but it is ignored here. All inner timers (lines 230, 265, 274, etc.) correctly respect TIMER_ENABLED_STEP. The inconsistency appears unintentional and will produce extremely verbose training output for all users.

Suggested change
@Timer(name="env_step", msg="Step took:", enable=True, format="us")
@Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed to enable=TIMER_ENABLED_STEP.

Comment on lines +128 to +131
"""Select call path based on mode, execute, and apply output."""
mode = self.get_mode_for_manager(self._manager_name_from_stage(stage))
if mode == ManagerCallMode.STABLE:
call, result = stable_call, self._run_call(stable_call)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STABLE mode (mode=0) crashes unconditionally — stable_call is never provided by call sites

When mode == ManagerCallMode.STABLE, _dispatch immediately calls self._run_call(stable_call). However, stable_call defaults to None in call_stage, and every call site in manager_based_rl_env_warp.py (e.g. ActionManager_process_action, ActionManager_apply_action, Scene_write_data_to_sim, TerminationManager_compute, RewardManager_compute, ObservationManager_compute_update_history, etc.) omits stable_call entirely.

_run_call(None) will immediately raise TypeError: 'NoneType' object is not subscriptable because it indexes call["fn"]. This means configuring any manager — or "default" — to mode 0 via MANAGER_CALL_CONFIG will crash the training loop, despite STABLE mode being documented as a supported execution path that routes to isaaclab.managers.

A guard should be added so the failure is caught early and the error message is actionable:

if mode == ManagerCallMode.STABLE:
    if stable_call is None:
        raise ValueError(
            f"Stage '{stage}' requested STABLE mode but no stable_call was provided."
        )
    call, result = stable_call, self._run_call(stable_call)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a guard that raises ValueError with an actionable message when stable_call is None.

@hujc7 hujc7 force-pushed the develop-manager-warp-cp branch from 7138023 to e044ddc Compare March 9, 2026 09:00
@hujc7
Copy link
Author

hujc7 commented Mar 9, 2026

@greptileai Review

Comment on lines +94 to +95
assert asset_cfg.joint_ids_wp is not None
assert env.rng_state_wp is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert for runtime validation is unsafe

These assert statements are used for runtime validation, but assertions are silently skipped when Python is run with the -O (optimize) flag. If a caller passes a regular isaaclab.managers.SceneEntityCfg (which lacks joint_ids_wp), or an env without rng_state_wp, the failure will produce a cryptic AttributeError instead of a clear message.

Replace with explicit exception raises:

Suggested change
assert asset_cfg.joint_ids_wp is not None
assert env.rng_state_wp is not None
if asset_cfg.joint_ids_wp is None:
raise ValueError(
f"reset_joints_by_offset requires an experimental SceneEntityCfg with resolved joint_ids_wp, "
f"but got None for asset '{asset_cfg.name}'. "
"Use isaaclab_experimental.managers.SceneEntityCfg and ensure joint_names are set."
)
if not hasattr(env, "rng_state_wp") or env.rng_state_wp is None:
raise AttributeError(
"reset_joints_by_offset requires env.rng_state_wp to be initialized. "
"Use ManagerBasedEnvWarp or ManagerBasedRLEnvWarp as the base environment."
)

The same applies to reset_joints_by_scale at lines 153–154.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Replaced assert with explicit raise ValueError/AttributeError with actionable messages in both reset_joints_by_offset and reset_joints_by_scale.

Comment on lines 20 to +22

with contextlib.suppress(ImportError):
from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedRLEnvWarp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYPE_CHECKING import inside contextlib.suppress leaves names undefined for type checkers

The contextlib.suppress(ImportError) pattern at runtime is correct, but under TYPE_CHECKING, static analysis tools (mypy, pyright) also evaluate this block literally. If the import fails in that context, DirectRLEnvWarp and ManagerBasedRLEnvWarp will be undefined when the type checker evaluates the unwrapped return annotation on line 132.

The same pattern appears in source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py, source/isaaclab_rl/isaaclab_rl/sb3.py, and source/isaaclab_rl/isaaclab_rl/skrl.py.

A cleaner alternative is to use a plain try/except ImportError that assigns stub type aliases when the import fails:

if TYPE_CHECKING:
    try:
        from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedEnvWarp, ManagerBasedRLEnvWarp
    except ImportError:
        from isaaclab.envs import ManagerBasedRLEnv as DirectRLEnvWarp  # type: ignore[assignment]
        from isaaclab.envs import ManagerBasedRLEnv as ManagerBasedEnvWarp  # type: ignore[assignment]
        from isaaclab.envs import ManagerBasedRLEnv as ManagerBasedRLEnvWarp  # type: ignore[assignment]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — not fixing in this PR. The contextlib.suppress pattern is consistent with the other RL wrappers (rl_games, sb3, skrl) and works correctly at runtime.

@hujc7 hujc7 changed the title [Experimental] Cherry-pick manager-based warp env infrastructure from dev/newton [Exp] Cherry-pick manager-based warp env infrastructure from dev/newton Mar 11, 2026
@hujc7 hujc7 force-pushed the develop-manager-warp-cp branch from e044ddc to 5f3bc76 Compare March 11, 2026 15:22
@hujc7 hujc7 force-pushed the develop-manager-warp-cp branch 3 times, most recently from 0b666f6 to 59d82f4 Compare March 12, 2026 06:05
@hujc7
Copy link
Author

hujc7 commented Mar 12, 2026

Latest changes

Addressed review feedback:

  • Replaced assert with explicit raise ValueError/AttributeError in reset_joints_by_offset and reset_joints_by_scale (events.py)
  • Switched Timer import from isaaclab_experimental.utils.timer to isaaclab.utils.timer (3 files) and fixed format=time_unit= kwarg to match stable API
  • Aligned manager-based Cartpole newton solver config with stable CartpolePhysicsCfg.newton (removed stale ls_iterations, ls_parallel)
  • Dropped redundant inhand manipulation commit (already in base PR [Exp] Cherry-pick direct warp envs from dev/newton #4905)

Verification

Cartpole Warp (manager-based), 300 iterations, 4096 envs, newton==1.0.0:

  • All managers running in mode 2 (WARP_CAPTURED)
  • Final episode length: 300.00 (solved)
  • Training time: 73s

@hujc7
Copy link
Author

hujc7 commented Mar 12, 2026

@greptileai Review

Comment on lines +28 to +36
def capture_or_replay(self, stage: str, fn: Callable[[], Any]) -> None:
"""Capture *fn* into a CUDA graph on the first call, then replay."""
graph = self._graphs.get(stage)
if graph is None:
with wp.ScopedCapture() as capture:
fn()
self._graphs[stage] = capture.graph
else:
wp.capture_launch(graph)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First call silently skips GPU execution

In CUDA graph capture mode (wp.ScopedCapture), operations are recorded but not executed on the GPU until the graph is explicitly launched. Because wp.capture_launch is only called on the else branch (i.e., second call onwards), the very first invocation of capture_or_replay records the operations but never launches the graph — so the GPU work for the first step is silently skipped.

This is a real divergence from ManagerCallSwitch._wp_capture_or_launch, which always calls wp.capture_launch after the capture block, guaranteeing execution even on the first call:

# ManagerCallSwitch._wp_capture_or_launch (correct)
if graph is None:
    with wp.ScopedCapture() as capture:
        result = call["fn"](...)
    self._wp_graphs[stage] = capture.graph
    self._wp_results[stage] = result
wp.capture_launch(self._wp_graphs[stage])   # always launched

The fix is to call wp.capture_launch after capture so the first call is not silently dropped:

def capture_or_replay(self, stage: str, fn: Callable[[], Any]) -> None:
    graph = self._graphs.get(stage)
    if graph is None:
        with wp.ScopedCapture() as capture:
            fn()
        self._graphs[stage] = capture.graph
    wp.capture_launch(self._graphs[stage])

The DirectRLEnvWarp.step() loop calls this for "action", "end_pre", and "end_post" stages, so every first-ever step silently drops those three GPU computations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. WarpGraphCache now captures AND immediately replays on first call, caches the return value, and supports args/kwargs. This ensures the graph is validated on first call and return references are stable.

Comment on lines +145 to +149
# TODO(jichuanh): this is problematic as warp capture requires stable pointers,
# using different masks for different managers/terms will cause problems.
# Pre-allocated env masks (shared across managers/terms via `env`).
self.ALL_ENV_MASK = wp.ones((self.num_envs,), dtype=wp.bool, device=self.device)
self.ENV_MASK = wp.zeros((self.num_envs,), dtype=wp.bool, device=self.device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared scratch mask is clobbered by nested resolve_env_mask calls

The TODO comment here flags the problem explicitly but it is a real, latent correctness bug. self.ENV_MASK is the single shared scratch_mask backing every call to resolve_env_mask. The pattern in _reset_idx is:

env_mask = self.resolve_env_mask(env_ids=env_ids)   # returns self.ENV_MASK
info = self.observation_manager.reset(env_mask=env_mask)   # env_mask IS self.ENV_MASK
info = self.action_manager.reset(env_mask=env_mask)        # same reference
info = self.event_manager.reset(env_mask=env_mask)         # same reference

If any manager term's reset implementation calls env.resolve_env_mask(env_ids=...) internally (e.g., a term that selects a sub-set of envs to act on), the shared buffer is overwritten while the outer reset loop is still holding the reference. Subsequent managers in the loop would then operate on the wrong mask.

The risk increases as new Warp terms are added. Consider allocating a per-call temporary mask, or at minimum documenting a strict contract that term implementations must not call env.resolve_env_mask from inside a reset callback.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — latent bug with shared scratch mask. No current code path triggers nested resolve_env_mask calls. Tracked for a future fix when nested mask resolution is needed.

with Timer(
name="action_preprocess", msg="Action preprocessing took:", enable=TIMER_ENABLED_STEP, time_unit="us"
):
assert self._action_in_wp is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert used for runtime input validation in hot path

assert self._action_in_wp is not None is silently disabled when Python is run with the -O (optimize) flag, which is common in production training scripts. Because _action_in_wp is always initialized in __init__, this check is never triggered in practice, but if refactoring ever changes the initialization order, the failure would be silent. Replace with an explicit guard:

Suggested change
assert self._action_in_wp is not None
if self._action_in_wp is None:
raise RuntimeError("_action_in_wp buffer is not initialized. Ensure __init__ has completed.")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Replaced assert with raise RuntimeError with actionable message.

@hujc7 hujc7 force-pushed the develop-manager-warp-cp branch from 59d82f4 to 0db1d9d Compare March 12, 2026 08:10
@hujc7
Copy link
Author

hujc7 commented Mar 12, 2026

@greptileai Review

"""The IO descriptor for the action term."""
self._IO_descriptor.name = re.sub(r"([a-z])([A-Z])", r"\1_\2", self.__class__.__name__).lower()
self._IO_descriptor.full_path = f"{self.__class__.__module__}.{self.__class__.__name__}"
self._IO_descriptor.description = " ".join(self.__class__.__doc__.split())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NoneType crash if subclass has no docstring

self.__class__.__doc__.split() will raise AttributeError: 'NoneType' object has no attribute 'split' when a concrete ActionTerm subclass has no docstring (i.e. __doc__ is None). This is a silent footgun for any custom term implementation that omits a class-level docstring.

Suggested change
self._IO_descriptor.description = " ".join(self.__class__.__doc__.split())
self._IO_descriptor.description = " ".join((self.__class__.__doc__ or "").split())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added guard: (self.__class__.__doc__ or '').split() to handle subclasses without docstrings.

Comment on lines +245 to +246
action_device = action.to(self.device)
wp.copy(self._action_in_wp, wp.from_torch(action_device, dtype=wp.float32))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing float32 / contiguous guard before wp.from_torch

wp.from_torch(action_device, dtype=wp.float32) creates a Warp array that shares memory with the PyTorch tensor and interprets it as float32. If action_device has a different dtype (e.g. float64 or int32), the raw memory is silently reinterpreted, producing corrupted action values.

The base class ManagerBasedEnvWarp.step() correctly guards against this:

if action_device.dtype != torch.float32:
    action_device = action_device.float()
if not action_device.is_contiguous():
    action_device = action_device.contiguous()

The same checks should be added here before the wp.from_torch / wp.copy call:

Suggested change
action_device = action.to(self.device)
wp.copy(self._action_in_wp, wp.from_torch(action_device, dtype=wp.float32))
action_device = action.to(self.device)
if action_device.dtype != torch.float32:
action_device = action_device.float()
if not action_device.is_contiguous():
action_device = action_device.contiguous()
wp.copy(self._action_in_wp, wp.from_torch(action_device, dtype=wp.float32))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action tensor comes from action.to(self.device) which preserves dtype/contiguity from the RL library output. All supported RL libraries (rsl_rl, rl_games, skrl, sb3) output float32 contiguous tensors. Adding an explicit guard would add overhead to the hot path for a case that doesn't arise in practice.

Copy link
Collaborator

@AntoineRichard AntoineRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this in experimental without major concerns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that needed? Is it RSL RL only? Would RL games need that? Would SKRL need it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix in base PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added the same import isaaclab_tasks_experimental to all RL train and play scripts (sb3, skrl, rl_games) in PR #4905.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's starting to be a bit broad?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was due to a crash problem which stops envs running. confirmed not needed now.

Comment on lines +608 to +610
# TODO(jichuanh): improvement can be made in two ways:
# 1. modifier specific check can be done in the modifier class
# 2. general param vs function matching check can be a common utility
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was that meant to be in the main observation manager?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a minor cleanup intended for stable — fixes a comment typo (list to listlist) and adds a TODO for improving modifier parameter validation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as the other PR: #4812

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already cleaned up — the Kit-only [python.pipapi] section was removed in PR #4905. The current extension.toml only has [package], [dependencies], [core], and [[python.module]].

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a complete duplication of this here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a duplication — the experimental version is a warp-specific decorator (inspect_obs_term_warp) for observation terms with func(env, out, **params) -> None signature (warp writes into pre-allocated output buffer). The stable version expects func(env, **params) -> Tensor. The GenericObservationIODescriptor dataclass is reused from stable so descriptor dicts are compatible with the existing export pipeline.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like an almost exact copy of spaces.py in main

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Thanks for catching this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of weird hybrid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the signature was modified but it's not supported. Should be added in a followup as it does not affect most of the envs.

@hujc7 hujc7 force-pushed the develop-manager-warp-cp branch from 0db1d9d to 881741c Compare March 12, 2026 23:23
hujc7 added 2 commits March 12, 2026 16:27
Add isaaclab_experimental package with DirectRLEnvWarp base class,
InteractiveSceneWarp, and WarpGraphCache utility.

Add direct warp environments in isaaclab_tasks_experimental:
- Cartpole, Ant, Humanoid, Locomotion (base), InHand Manipulation,
  Allegro Hand, with agent configs for rsl_rl, rl_games, skrl, sb3.

Adapt to develop base class API:
- find_joints 2-value return (indices, names)
- episode_length_buf as property with in-place copy for warp sync
- _ALL_ENV_MASK on base env instead of articulation
- set_joint_effort_target_mask for CUDA graph compatibility
- _get_observations returns dict for rsl_rl wrapper

Align solver configs with stable develop PresetCfg values.
Add safe_normalize to guard against NaN from wp.normalize.
Fix reset_root to_targets computation to use actual targets.
Fix cartpole reset kernel to use parameterized joint indices.
Clean up extension.toml and setup.py dependencies.
Switch timer import to isaaclab.utils.timer.
Add experimental warp-compatible manager implementations, MDP terms,
utilities (buffers, modifiers, noise, warp kernels), ManagerCallSwitch
for eager/captured dispatch, and manager-based env orchestration.

Includes RL library wrapper updates (rsl_rl, rl_games, sb3, skrl) to
accept warp env types, and minor stable fixes (settings_manager
RuntimeError handling, observation_manager comment cleanup).
@hujc7 hujc7 force-pushed the develop-manager-warp-cp branch from 881741c to 3cd41f3 Compare March 12, 2026 23:28
@hujc7
Copy link
Author

hujc7 commented Mar 12, 2026

@greptileai Review

Comment on lines +173 to +192
def resolve_asset_cfg(cfg: dict, env: ManagerBasedEnv) -> SceneEntityCfg:
asset_cfg = None

for value in cfg.values():
# If it exists, the SceneEntityCfg should have been resolved by the base manager.
if isinstance(value, SceneEntityCfg):
asset_cfg = value
# Check if the joint ids are not set, and if so, resolve them.
if asset_cfg.joint_ids is None or asset_cfg.joint_ids == slice(None):
asset_cfg.resolve_for_warp(env.scene)
if asset_cfg.body_ids is None or asset_cfg.body_ids == slice(None):
asset_cfg.resolve_for_warp(env.scene)
break

# If it doesn't exist, use the default robot entity.
if asset_cfg is None:
asset_cfg = SceneEntityCfg("robot")
asset_cfg.resolve_for_warp(env.scene)

return asset_cfg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_for_warp method does not exist — AttributeError on every call

resolve_asset_cfg calls asset_cfg.resolve_for_warp(env.scene) three times, but SceneEntityCfg (and its parent _SceneEntityCfg) only defines resolve(), not resolve_for_warp(). Any caller of this function will get:

AttributeError: 'SceneEntityCfg' object has no attribute 'resolve_for_warp'

Although resolve_asset_cfg is currently not called anywhere internally, it is publicly exported from isaaclab_experimental.utils.warp and is therefore part of the public API. The fix is to replace resolve_for_warp with resolve:

def resolve_asset_cfg(cfg: dict, env: ManagerBasedEnv) -> SceneEntityCfg:
    asset_cfg = None
    for value in cfg.values():
        if isinstance(value, SceneEntityCfg):
            asset_cfg = value
            if not getattr(asset_cfg, "_is_resolved", False):
                asset_cfg.resolve(env.scene)
            break
    if asset_cfg is None:
        asset_cfg = SceneEntityCfg("robot")
        asset_cfg.resolve(env.scene)
    return asset_cfg

Note also the existing logic unconditionally calls resolve_for_warp twice in the same loop iteration (once for joints, once for bodies), which is redundant since resolve() handles both in a single call.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — dead code with no callers on this branch. Also removed the associated SceneEntityCfg and ManagerBasedEnv imports.

Comment on lines +34 to +44
for deformable_object in self._deformable_objects.values():
deformable_object.reset(env_ids)
for rigid_object in self._rigid_objects.values():
rigid_object.reset(env_ids, env_mask=env_mask)
for surface_gripper in self._surface_grippers.values():
surface_gripper.reset(env_ids)
for rigid_object_collection in self._rigid_object_collections.values():
rigid_object_collection.reset(env_ids, env_mask=env_mask)
# -- sensors (no env_mask support)
for sensor in self._sensors.values():
sensor.reset(env_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deformable objects and surface grippers always reset all environments when called with env_mask only

When InteractiveSceneWarp.reset() is called from ManagerBasedRLEnvWarp._reset_idx via call_stage, it is invoked as scene.reset(env_mask=env_mask) with env_ids left at its default value of None. Inside the method:

for deformable_object in self._deformable_objects.values():
    deformable_object.reset(env_ids)   # env_ids=None → resets ALL environments
for surface_gripper in self._surface_grippers.values():
    surface_gripper.reset(env_ids)     # same problem

Both of these receive env_ids=None, which resets all environments rather than only the subset specified by env_mask. Articulations and rigid objects are correctly limited to the mask, but deformable bodies and surface grippers silently over-reset, corrupting the state of environments that should not have been touched.

This does not affect the reference Isaac-Cartpole-Warp-v0 task (no deformable objects or surface grippers), but is a silent correctness bug for any task that uses them. Consider either:

  • Deriving env_ids from env_mask before calling these reset methods, or
  • Documenting the limitation prominently so downstream users are not surprised.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applicable — deformable objects and surface grippers are PhysX-only assets (isaaclab_physx). Warp envs use Newton, so these collections are always empty. The loops are inherited from InteractiveScene and are no-ops.

Comment on lines +75 to +76
assert asset_cfg.joint_mask is not None
assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert used for runtime validation — silently disabled with -O flag

These two assert statements perform runtime validation that should be preserved in optimized builds:

assert asset_cfg.joint_mask is not None
assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0]

Python silently skips all assert statements when run with the -O (optimize) flag, which is common in production training scripts. If joint_mask is None (e.g. a plain isaaclab.managers.SceneEntityCfg was passed instead of isaaclab_experimental.managers.SceneEntityCfg), the failure will produce an unhelpful AttributeError deep inside the Warp kernel rather than a clear message at the call site.

The same issue was already fixed in events.py. Apply the same pattern here:

Suggested change
assert asset_cfg.joint_mask is not None
assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0]
if asset_cfg.joint_mask is None:
raise ValueError(
f"joint_pos_out_of_manual_limit requires an experimental SceneEntityCfg with resolved joint_mask, "
f"but got None for asset '{asset_cfg.name}'. "
"Use isaaclab_experimental.managers.SceneEntityCfg and ensure joint_names are set."
)
if asset.data.joint_pos.shape[1] != asset_cfg.joint_mask.shape[0]:
raise ValueError(
f"joint_mask shape {asset_cfg.joint_mask.shape[0]} does not match "
f"num_joints {asset.data.joint_pos.shape[1]} for asset '{asset_cfg.name}'."
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Replaced assert with explicit raise ValueError with descriptive messages.

Add an experimental manager-based Cartpole environment using the warp
manager infrastructure as a reference task for testing and benchmarking.
@hujc7 hujc7 force-pushed the develop-manager-warp-cp branch from 3cd41f3 to 224a902 Compare March 13, 2026 07:41
@hujc7
Copy link
Author

hujc7 commented Mar 13, 2026

@greptileai Review

Comment on lines +331 to +337
assert self._env.rng_state_wp is not None

# iterate over all the interval terms (fixed list; captured graph-friendly)
for i, term_cfg in enumerate(self._mode_term_cfgs["interval"]):
lower, upper = self._interval_term_ranges[i]
if self._interval_term_is_global[i]:
assert self._interval_global_rng_state_wp is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert in hot-path methods — inconsistent with fixes applied elsewhere in this PR

_apply_interval (and _apply_reset at line 370) still use assert for internal invariant checks:

assert self._env.rng_state_wp is not None       # line 331
assert self._interval_global_rng_state_wp is not None  # line 337
assert self._scratch_term_mask_wp is not None   # line 370

Python silently strips all assert statements when running with the -O (optimize) flag, which is common in production training scripts. These are the exact same patterns that were already fixed in this PR (in events.py and terminations.py) by replacing them with explicit raise. The inconsistency is easy to miss since these live in the manager implementation while the prior fixes were in MDP term functions.

The conditions themselves are always True under normal construction order (all fields are set in __init__ before _apply_* can be called), so these won't trigger in practice. However, if initialization order changes during future refactoring, the failure will be a cryptic TypeError/AttributeError deep inside a Warp kernel rather than a clear message at the assertion site.

For consistency with the rest of this PR, consider replacing with explicit RuntimeError raises — e.g.:

if self._env.rng_state_wp is None:
    raise RuntimeError("EventManager._apply_interval: env.rng_state_wp is not initialized.")

The same applies to line 455 (_prepare_terms), which is a one-time init path but is still an assert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants