[Exp] Cherry-pick manager-based warp env infrastructure from dev/newton#4829
[Exp] Cherry-pick manager-based warp env infrastructure from dev/newton#4829hujc7 wants to merge 3 commits intoisaac-sim:developfrom
Conversation
|
@greptile review |
|
WIP but put in review for bot rebiew. |
Greptile SummaryThis PR cherry-picks the Warp manager-based environment infrastructure from Key observations from this review:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant RL as RL Library
participant RLEnv as ManagerBasedRLEnvWarp
participant MCS as ManagerCallSwitch
participant WGC as WarpGraphCache
participant AM as ActionManager
participant Scene as InteractiveSceneWarp
participant Sim as SimulationContext
participant TM as TerminationManager
participant ReM as RewardManager
participant EM as EventManager
participant OM as ObservationManager
RL->>RLEnv: step(action)
RLEnv->>RLEnv: copy action → _action_in_wp (stable ptr)
RLEnv->>MCS: call_stage("ActionManager_process_action")
MCS->>WGC: capture_or_replay (mode=2)
WGC->>AM: process_action(_action_in_wp)
loop cfg.decimation
RLEnv->>MCS: call_stage("ActionManager_apply_action")
MCS->>WGC: capture_or_replay
WGC->>AM: apply_action()
RLEnv->>MCS: call_stage("Scene_write_data_to_sim") [capped mode=1]
MCS->>Scene: write_data_to_sim() [eager, not captured]
RLEnv->>Sim: step()
RLEnv->>Scene: update(dt)
end
RLEnv->>RLEnv: episode_length_buf += 1
RLEnv->>MCS: call_stage("TerminationManager_compute")
MCS->>WGC: capture_or_replay
WGC->>TM: compute() → dones_wp
RLEnv->>MCS: call_stage("RewardManager_compute")
MCS->>WGC: capture_or_replay
WGC->>ReM: compute(dt) → reward_buf
RLEnv->>RLEnv: reset_env_ids = reset_buf.nonzero()
opt len(reset_env_ids) > 0
RLEnv->>RLEnv: _reset_idx(env_ids, env_mask)
RLEnv->>MCS: call_stage("Scene_reset")
RLEnv->>MCS: call_stage("EventManager_apply_reset")
RLEnv->>MCS: call_stage("ObservationManager_reset")
RLEnv->>MCS: call_stage("TerminationManager_reset")
end
RLEnv->>MCS: call_stage("EventManager_apply_interval")
MCS->>WGC: capture_or_replay
WGC->>EM: apply(mode="interval")
RLEnv->>MCS: call_stage("ObservationManager_compute_update_history")
MCS->>WGC: capture_or_replay
WGC->>OM: compute(update_history=True) → obs_buf
RLEnv-->>RL: (obs_buf, reward_buf, terminated, time_outs, extras)
Last reviewed commit: 224a902 |
| def _load_cfg(self, cfg_source: str | None) -> dict[str, int]: | ||
| if cfg_source is not None and not isinstance(cfg_source, str): | ||
| raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}") | ||
| if cfg_source is None or cfg_source.strip() == "": | ||
| return dict(self.DEFAULT_CONFIG) |
There was a problem hiding this comment.
MAX_MODE_OVERRIDES not applied for default config.
When cfg_source is None (the common case without --manager_call_config), _load_cfg returns early at line 162 without applying MAX_MODE_OVERRIDES. The PR description and class comment both state that Scene_write_data_to_sim must be capped at WARP_NOT_CAPTURED (mode 1) because _apply_actuator_model is not CUDA-graph-capture-safe. With the default config {"default": 2}, get_mode_for_manager("Scene") will return 2, bypassing the cap entirely and allowing unsafe CUDA graph capture.
The fix is to apply MAX_MODE_OVERRIDES on the default-config path:
| def _load_cfg(self, cfg_source: str | None) -> dict[str, int]: | |
| if cfg_source is not None and not isinstance(cfg_source, str): | |
| raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}") | |
| if cfg_source is None or cfg_source.strip() == "": | |
| return dict(self.DEFAULT_CONFIG) | |
| def _load_cfg(self, cfg_source: str | None) -> dict[str, int]: | |
| if cfg_source is not None and not isinstance(cfg_source, str): | |
| raise TypeError(f"cfg_source must be a string or None, got: {type(cfg_source)}") | |
| if cfg_source is None or cfg_source.strip() == "": | |
| cfg = dict(self.DEFAULT_CONFIG) | |
| # Apply caps even for the default config | |
| default_mode = cfg[self.DEFAULT_KEY] | |
| for name, max_mode in self.MAX_MODE_OVERRIDES.items(): | |
| resolved = cfg.get(name, default_mode) | |
| if resolved > max_mode: | |
| cfg[name] = max_mode | |
| return cfg |
There was a problem hiding this comment.
Done. Restructured _load_cfg so both paths (default config and parsed env var) flow into the shared MAX_MODE_OVERRIDES application at the end.
| @Timer(name="env_step", msg="Step took:", enable=True, format="us") | ||
| def step(self, action: torch.Tensor) -> VecEnvStepReturn: |
There was a problem hiding this comment.
@Timer(enable=True) hardcoded to always-on for stable step().
The decorator is hardcoded to enable=True, meaning it will print timing output to stdout on every environment step during training. The module-level constant TIMER_ENABLED_STEP = False was clearly introduced to gate timing output globally, but this decorator bypasses it. All inner context-manager timers correctly respect TIMER_ENABLED_STEP (lines 184, 191, 209, etc.), making this inconsistency appear unintentional.
This will generate extremely verbose and undesirable output for all users of the stable environment.
| @Timer(name="env_step", msg="Step took:", enable=True, format="us") | |
| def step(self, action: torch.Tensor) -> VecEnvStepReturn: | |
| @Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us") | |
| def step(self, action: torch.Tensor) -> VecEnvStepReturn: |
There was a problem hiding this comment.
Not in scope — this comment is about the stable env (source/isaaclab/isaaclab/envs/manager_based_rl_env.py), which is not modified in this PR.
6d1ac95 to
7138023
Compare
|
@greptileai Review |
| self.reset_terminated = self.termination_manager.terminated | ||
| self.reset_time_outs = self.termination_manager.time_outs | ||
|
|
||
| @Timer(name="env_step", msg="Step took:", enable=True, format="us") |
There was a problem hiding this comment.
Outer step() timer always enabled, ignores TIMER_ENABLED_STEP
The @Timer decorator on step() has enable=True hardcoded, which means the outer step timer prints "Step took: …" to stdout on every environment step during training — even when neither DEBUG_TIMER_STEP nor DEBUG_TIMERS env vars are set.
The module-level constant TIMER_ENABLED_STEP was specifically introduced to gate this output (DEBUG_TIMER_STEP = os.environ.get("DEBUG_TIMER_STEP", "0") == "1"), but it is ignored here. All inner timers (lines 230, 265, 274, etc.) correctly respect TIMER_ENABLED_STEP. The inconsistency appears unintentional and will produce extremely verbose training output for all users.
| @Timer(name="env_step", msg="Step took:", enable=True, format="us") | |
| @Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us") |
There was a problem hiding this comment.
Done. Changed to enable=TIMER_ENABLED_STEP.
| """Select call path based on mode, execute, and apply output.""" | ||
| mode = self.get_mode_for_manager(self._manager_name_from_stage(stage)) | ||
| if mode == ManagerCallMode.STABLE: | ||
| call, result = stable_call, self._run_call(stable_call) |
There was a problem hiding this comment.
STABLE mode (mode=0) crashes unconditionally — stable_call is never provided by call sites
When mode == ManagerCallMode.STABLE, _dispatch immediately calls self._run_call(stable_call). However, stable_call defaults to None in call_stage, and every call site in manager_based_rl_env_warp.py (e.g. ActionManager_process_action, ActionManager_apply_action, Scene_write_data_to_sim, TerminationManager_compute, RewardManager_compute, ObservationManager_compute_update_history, etc.) omits stable_call entirely.
_run_call(None) will immediately raise TypeError: 'NoneType' object is not subscriptable because it indexes call["fn"]. This means configuring any manager — or "default" — to mode 0 via MANAGER_CALL_CONFIG will crash the training loop, despite STABLE mode being documented as a supported execution path that routes to isaaclab.managers.
A guard should be added so the failure is caught early and the error message is actionable:
if mode == ManagerCallMode.STABLE:
if stable_call is None:
raise ValueError(
f"Stage '{stage}' requested STABLE mode but no stable_call was provided."
)
call, result = stable_call, self._run_call(stable_call)There was a problem hiding this comment.
Done. Added a guard that raises ValueError with an actionable message when stable_call is None.
7138023 to
e044ddc
Compare
|
@greptileai Review |
| assert asset_cfg.joint_ids_wp is not None | ||
| assert env.rng_state_wp is not None |
There was a problem hiding this comment.
assert for runtime validation is unsafe
These assert statements are used for runtime validation, but assertions are silently skipped when Python is run with the -O (optimize) flag. If a caller passes a regular isaaclab.managers.SceneEntityCfg (which lacks joint_ids_wp), or an env without rng_state_wp, the failure will produce a cryptic AttributeError instead of a clear message.
Replace with explicit exception raises:
| assert asset_cfg.joint_ids_wp is not None | |
| assert env.rng_state_wp is not None | |
| if asset_cfg.joint_ids_wp is None: | |
| raise ValueError( | |
| f"reset_joints_by_offset requires an experimental SceneEntityCfg with resolved joint_ids_wp, " | |
| f"but got None for asset '{asset_cfg.name}'. " | |
| "Use isaaclab_experimental.managers.SceneEntityCfg and ensure joint_names are set." | |
| ) | |
| if not hasattr(env, "rng_state_wp") or env.rng_state_wp is None: | |
| raise AttributeError( | |
| "reset_joints_by_offset requires env.rng_state_wp to be initialized. " | |
| "Use ManagerBasedEnvWarp or ManagerBasedRLEnvWarp as the base environment." | |
| ) |
The same applies to reset_joints_by_scale at lines 153–154.
There was a problem hiding this comment.
Done. Replaced assert with explicit raise ValueError/AttributeError with actionable messages in both reset_joints_by_offset and reset_joints_by_scale.
|
|
||
| with contextlib.suppress(ImportError): | ||
| from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedRLEnvWarp |
There was a problem hiding this comment.
TYPE_CHECKING import inside contextlib.suppress leaves names undefined for type checkers
The contextlib.suppress(ImportError) pattern at runtime is correct, but under TYPE_CHECKING, static analysis tools (mypy, pyright) also evaluate this block literally. If the import fails in that context, DirectRLEnvWarp and ManagerBasedRLEnvWarp will be undefined when the type checker evaluates the unwrapped return annotation on line 132.
The same pattern appears in source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py, source/isaaclab_rl/isaaclab_rl/sb3.py, and source/isaaclab_rl/isaaclab_rl/skrl.py.
A cleaner alternative is to use a plain try/except ImportError that assigns stub type aliases when the import fails:
if TYPE_CHECKING:
try:
from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedEnvWarp, ManagerBasedRLEnvWarp
except ImportError:
from isaaclab.envs import ManagerBasedRLEnv as DirectRLEnvWarp # type: ignore[assignment]
from isaaclab.envs import ManagerBasedRLEnv as ManagerBasedEnvWarp # type: ignore[assignment]
from isaaclab.envs import ManagerBasedRLEnv as ManagerBasedRLEnvWarp # type: ignore[assignment]There was a problem hiding this comment.
Low priority — not fixing in this PR. The contextlib.suppress pattern is consistent with the other RL wrappers (rl_games, sb3, skrl) and works correctly at runtime.
e044ddc to
5f3bc76
Compare
0b666f6 to
59d82f4
Compare
Latest changesAddressed review feedback:
VerificationCartpole Warp (manager-based), 300 iterations, 4096 envs,
|
|
@greptileai Review |
| def capture_or_replay(self, stage: str, fn: Callable[[], Any]) -> None: | ||
| """Capture *fn* into a CUDA graph on the first call, then replay.""" | ||
| graph = self._graphs.get(stage) | ||
| if graph is None: | ||
| with wp.ScopedCapture() as capture: | ||
| fn() | ||
| self._graphs[stage] = capture.graph | ||
| else: | ||
| wp.capture_launch(graph) |
There was a problem hiding this comment.
First call silently skips GPU execution
In CUDA graph capture mode (wp.ScopedCapture), operations are recorded but not executed on the GPU until the graph is explicitly launched. Because wp.capture_launch is only called on the else branch (i.e., second call onwards), the very first invocation of capture_or_replay records the operations but never launches the graph — so the GPU work for the first step is silently skipped.
This is a real divergence from ManagerCallSwitch._wp_capture_or_launch, which always calls wp.capture_launch after the capture block, guaranteeing execution even on the first call:
# ManagerCallSwitch._wp_capture_or_launch (correct)
if graph is None:
with wp.ScopedCapture() as capture:
result = call["fn"](...)
self._wp_graphs[stage] = capture.graph
self._wp_results[stage] = result
wp.capture_launch(self._wp_graphs[stage]) # always launchedThe fix is to call wp.capture_launch after capture so the first call is not silently dropped:
def capture_or_replay(self, stage: str, fn: Callable[[], Any]) -> None:
graph = self._graphs.get(stage)
if graph is None:
with wp.ScopedCapture() as capture:
fn()
self._graphs[stage] = capture.graph
wp.capture_launch(self._graphs[stage])The DirectRLEnvWarp.step() loop calls this for "action", "end_pre", and "end_post" stages, so every first-ever step silently drops those three GPU computations.
There was a problem hiding this comment.
Done. WarpGraphCache now captures AND immediately replays on first call, caches the return value, and supports args/kwargs. This ensures the graph is validated on first call and return references are stable.
| # TODO(jichuanh): this is problematic as warp capture requires stable pointers, | ||
| # using different masks for different managers/terms will cause problems. | ||
| # Pre-allocated env masks (shared across managers/terms via `env`). | ||
| self.ALL_ENV_MASK = wp.ones((self.num_envs,), dtype=wp.bool, device=self.device) | ||
| self.ENV_MASK = wp.zeros((self.num_envs,), dtype=wp.bool, device=self.device) |
There was a problem hiding this comment.
Shared scratch mask is clobbered by nested resolve_env_mask calls
The TODO comment here flags the problem explicitly but it is a real, latent correctness bug. self.ENV_MASK is the single shared scratch_mask backing every call to resolve_env_mask. The pattern in _reset_idx is:
env_mask = self.resolve_env_mask(env_ids=env_ids) # returns self.ENV_MASK
info = self.observation_manager.reset(env_mask=env_mask) # env_mask IS self.ENV_MASK
info = self.action_manager.reset(env_mask=env_mask) # same reference
info = self.event_manager.reset(env_mask=env_mask) # same referenceIf any manager term's reset implementation calls env.resolve_env_mask(env_ids=...) internally (e.g., a term that selects a sub-set of envs to act on), the shared buffer is overwritten while the outer reset loop is still holding the reference. Subsequent managers in the loop would then operate on the wrong mask.
The risk increases as new Warp terms are added. Consider allocating a per-call temporary mask, or at minimum documenting a strict contract that term implementations must not call env.resolve_env_mask from inside a reset callback.
There was a problem hiding this comment.
Acknowledged — latent bug with shared scratch mask. No current code path triggers nested resolve_env_mask calls. Tracked for a future fix when nested mask resolution is needed.
| with Timer( | ||
| name="action_preprocess", msg="Action preprocessing took:", enable=TIMER_ENABLED_STEP, time_unit="us" | ||
| ): | ||
| assert self._action_in_wp is not None |
There was a problem hiding this comment.
assert used for runtime input validation in hot path
assert self._action_in_wp is not None is silently disabled when Python is run with the -O (optimize) flag, which is common in production training scripts. Because _action_in_wp is always initialized in __init__, this check is never triggered in practice, but if refactoring ever changes the initialization order, the failure would be silent. Replace with an explicit guard:
| assert self._action_in_wp is not None | |
| if self._action_in_wp is None: | |
| raise RuntimeError("_action_in_wp buffer is not initialized. Ensure __init__ has completed.") |
There was a problem hiding this comment.
Done. Replaced assert with raise RuntimeError with actionable message.
59d82f4 to
0db1d9d
Compare
|
@greptileai Review |
| """The IO descriptor for the action term.""" | ||
| self._IO_descriptor.name = re.sub(r"([a-z])([A-Z])", r"\1_\2", self.__class__.__name__).lower() | ||
| self._IO_descriptor.full_path = f"{self.__class__.__module__}.{self.__class__.__name__}" | ||
| self._IO_descriptor.description = " ".join(self.__class__.__doc__.split()) |
There was a problem hiding this comment.
NoneType crash if subclass has no docstring
self.__class__.__doc__.split() will raise AttributeError: 'NoneType' object has no attribute 'split' when a concrete ActionTerm subclass has no docstring (i.e. __doc__ is None). This is a silent footgun for any custom term implementation that omits a class-level docstring.
| self._IO_descriptor.description = " ".join(self.__class__.__doc__.split()) | |
| self._IO_descriptor.description = " ".join((self.__class__.__doc__ or "").split()) |
There was a problem hiding this comment.
Done. Added guard: (self.__class__.__doc__ or '').split() to handle subclasses without docstrings.
| action_device = action.to(self.device) | ||
| wp.copy(self._action_in_wp, wp.from_torch(action_device, dtype=wp.float32)) |
There was a problem hiding this comment.
Missing float32 / contiguous guard before wp.from_torch
wp.from_torch(action_device, dtype=wp.float32) creates a Warp array that shares memory with the PyTorch tensor and interprets it as float32. If action_device has a different dtype (e.g. float64 or int32), the raw memory is silently reinterpreted, producing corrupted action values.
The base class ManagerBasedEnvWarp.step() correctly guards against this:
if action_device.dtype != torch.float32:
action_device = action_device.float()
if not action_device.is_contiguous():
action_device = action_device.contiguous()The same checks should be added here before the wp.from_torch / wp.copy call:
| action_device = action.to(self.device) | |
| wp.copy(self._action_in_wp, wp.from_torch(action_device, dtype=wp.float32)) | |
| action_device = action.to(self.device) | |
| if action_device.dtype != torch.float32: | |
| action_device = action_device.float() | |
| if not action_device.is_contiguous(): | |
| action_device = action_device.contiguous() | |
| wp.copy(self._action_in_wp, wp.from_torch(action_device, dtype=wp.float32)) |
There was a problem hiding this comment.
The action tensor comes from action.to(self.device) which preserves dtype/contiguity from the RL library output. All supported RL libraries (rsl_rl, rl_games, skrl, sb3) output float32 contiguous tensors. Adding an explicit guard would add overhead to the hot path for a case that doesn't arise in practice.
AntoineRichard
left a comment
There was a problem hiding this comment.
I think we can merge this in experimental without major concerns.
There was a problem hiding this comment.
Is that needed? Is it RSL RL only? Would RL games need that? Would SKRL need it?
There was a problem hiding this comment.
Done — added the same import isaaclab_tasks_experimental to all RL train and play scripts (sb3, skrl, rl_games) in PR #4905.
There was a problem hiding this comment.
That's starting to be a bit broad?
There was a problem hiding this comment.
it was due to a crash problem which stops envs running. confirmed not needed now.
| # TODO(jichuanh): improvement can be made in two ways: | ||
| # 1. modifier specific check can be done in the modifier class | ||
| # 2. general param vs function matching check can be a common utility |
There was a problem hiding this comment.
Was that meant to be in the main observation manager?
There was a problem hiding this comment.
Yes, this is a minor cleanup intended for stable — fixes a comment typo (list to list → list) and adds a TODO for improving modifier parameter validation.
There was a problem hiding this comment.
Already cleaned up — the Kit-only [python.pipapi] section was removed in PR #4905. The current extension.toml only has [package], [dependencies], [core], and [[python.module]].
There was a problem hiding this comment.
Do we need a complete duplication of this here?
There was a problem hiding this comment.
Not a duplication — the experimental version is a warp-specific decorator (inspect_obs_term_warp) for observation terms with func(env, out, **params) -> None signature (warp writes into pre-allocated output buffer). The stable version expects func(env, **params) -> Tensor. The GenericObservationIODescriptor dataclass is reused from stable so descriptor dicts are compatible with the existing export pipeline.
There was a problem hiding this comment.
Do we need this here?
There was a problem hiding this comment.
It looks like an almost exact copy of spaces.py in main
There was a problem hiding this comment.
Removed. Thanks for catching this!
There was a problem hiding this comment.
This is a bit of weird hybrid.
There was a problem hiding this comment.
Yea, the signature was modified but it's not supported. Should be added in a followup as it does not affect most of the envs.
0db1d9d to
881741c
Compare
Add isaaclab_experimental package with DirectRLEnvWarp base class, InteractiveSceneWarp, and WarpGraphCache utility. Add direct warp environments in isaaclab_tasks_experimental: - Cartpole, Ant, Humanoid, Locomotion (base), InHand Manipulation, Allegro Hand, with agent configs for rsl_rl, rl_games, skrl, sb3. Adapt to develop base class API: - find_joints 2-value return (indices, names) - episode_length_buf as property with in-place copy for warp sync - _ALL_ENV_MASK on base env instead of articulation - set_joint_effort_target_mask for CUDA graph compatibility - _get_observations returns dict for rsl_rl wrapper Align solver configs with stable develop PresetCfg values. Add safe_normalize to guard against NaN from wp.normalize. Fix reset_root to_targets computation to use actual targets. Fix cartpole reset kernel to use parameterized joint indices. Clean up extension.toml and setup.py dependencies. Switch timer import to isaaclab.utils.timer.
Add experimental warp-compatible manager implementations, MDP terms, utilities (buffers, modifiers, noise, warp kernels), ManagerCallSwitch for eager/captured dispatch, and manager-based env orchestration. Includes RL library wrapper updates (rsl_rl, rl_games, sb3, skrl) to accept warp env types, and minor stable fixes (settings_manager RuntimeError handling, observation_manager comment cleanup).
881741c to
3cd41f3
Compare
|
@greptileai Review |
| def resolve_asset_cfg(cfg: dict, env: ManagerBasedEnv) -> SceneEntityCfg: | ||
| asset_cfg = None | ||
|
|
||
| for value in cfg.values(): | ||
| # If it exists, the SceneEntityCfg should have been resolved by the base manager. | ||
| if isinstance(value, SceneEntityCfg): | ||
| asset_cfg = value | ||
| # Check if the joint ids are not set, and if so, resolve them. | ||
| if asset_cfg.joint_ids is None or asset_cfg.joint_ids == slice(None): | ||
| asset_cfg.resolve_for_warp(env.scene) | ||
| if asset_cfg.body_ids is None or asset_cfg.body_ids == slice(None): | ||
| asset_cfg.resolve_for_warp(env.scene) | ||
| break | ||
|
|
||
| # If it doesn't exist, use the default robot entity. | ||
| if asset_cfg is None: | ||
| asset_cfg = SceneEntityCfg("robot") | ||
| asset_cfg.resolve_for_warp(env.scene) | ||
|
|
||
| return asset_cfg |
There was a problem hiding this comment.
resolve_for_warp method does not exist — AttributeError on every call
resolve_asset_cfg calls asset_cfg.resolve_for_warp(env.scene) three times, but SceneEntityCfg (and its parent _SceneEntityCfg) only defines resolve(), not resolve_for_warp(). Any caller of this function will get:
AttributeError: 'SceneEntityCfg' object has no attribute 'resolve_for_warp'
Although resolve_asset_cfg is currently not called anywhere internally, it is publicly exported from isaaclab_experimental.utils.warp and is therefore part of the public API. The fix is to replace resolve_for_warp with resolve:
def resolve_asset_cfg(cfg: dict, env: ManagerBasedEnv) -> SceneEntityCfg:
asset_cfg = None
for value in cfg.values():
if isinstance(value, SceneEntityCfg):
asset_cfg = value
if not getattr(asset_cfg, "_is_resolved", False):
asset_cfg.resolve(env.scene)
break
if asset_cfg is None:
asset_cfg = SceneEntityCfg("robot")
asset_cfg.resolve(env.scene)
return asset_cfgNote also the existing logic unconditionally calls resolve_for_warp twice in the same loop iteration (once for joints, once for bodies), which is redundant since resolve() handles both in a single call.
There was a problem hiding this comment.
Removed — dead code with no callers on this branch. Also removed the associated SceneEntityCfg and ManagerBasedEnv imports.
| for deformable_object in self._deformable_objects.values(): | ||
| deformable_object.reset(env_ids) | ||
| for rigid_object in self._rigid_objects.values(): | ||
| rigid_object.reset(env_ids, env_mask=env_mask) | ||
| for surface_gripper in self._surface_grippers.values(): | ||
| surface_gripper.reset(env_ids) | ||
| for rigid_object_collection in self._rigid_object_collections.values(): | ||
| rigid_object_collection.reset(env_ids, env_mask=env_mask) | ||
| # -- sensors (no env_mask support) | ||
| for sensor in self._sensors.values(): | ||
| sensor.reset(env_ids) |
There was a problem hiding this comment.
Deformable objects and surface grippers always reset all environments when called with env_mask only
When InteractiveSceneWarp.reset() is called from ManagerBasedRLEnvWarp._reset_idx via call_stage, it is invoked as scene.reset(env_mask=env_mask) with env_ids left at its default value of None. Inside the method:
for deformable_object in self._deformable_objects.values():
deformable_object.reset(env_ids) # env_ids=None → resets ALL environments
for surface_gripper in self._surface_grippers.values():
surface_gripper.reset(env_ids) # same problemBoth of these receive env_ids=None, which resets all environments rather than only the subset specified by env_mask. Articulations and rigid objects are correctly limited to the mask, but deformable bodies and surface grippers silently over-reset, corrupting the state of environments that should not have been touched.
This does not affect the reference Isaac-Cartpole-Warp-v0 task (no deformable objects or surface grippers), but is a silent correctness bug for any task that uses them. Consider either:
- Deriving
env_idsfromenv_maskbefore calling these reset methods, or - Documenting the limitation prominently so downstream users are not surprised.
There was a problem hiding this comment.
Not applicable — deformable objects and surface grippers are PhysX-only assets (isaaclab_physx). Warp envs use Newton, so these collections are always empty. The loops are inherited from InteractiveScene and are no-ops.
| assert asset_cfg.joint_mask is not None | ||
| assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0] |
There was a problem hiding this comment.
assert used for runtime validation — silently disabled with -O flag
These two assert statements perform runtime validation that should be preserved in optimized builds:
assert asset_cfg.joint_mask is not None
assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0]Python silently skips all assert statements when run with the -O (optimize) flag, which is common in production training scripts. If joint_mask is None (e.g. a plain isaaclab.managers.SceneEntityCfg was passed instead of isaaclab_experimental.managers.SceneEntityCfg), the failure will produce an unhelpful AttributeError deep inside the Warp kernel rather than a clear message at the call site.
The same issue was already fixed in events.py. Apply the same pattern here:
| assert asset_cfg.joint_mask is not None | |
| assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0] | |
| if asset_cfg.joint_mask is None: | |
| raise ValueError( | |
| f"joint_pos_out_of_manual_limit requires an experimental SceneEntityCfg with resolved joint_mask, " | |
| f"but got None for asset '{asset_cfg.name}'. " | |
| "Use isaaclab_experimental.managers.SceneEntityCfg and ensure joint_names are set." | |
| ) | |
| if asset.data.joint_pos.shape[1] != asset_cfg.joint_mask.shape[0]: | |
| raise ValueError( | |
| f"joint_mask shape {asset_cfg.joint_mask.shape[0]} does not match " | |
| f"num_joints {asset.data.joint_pos.shape[1]} for asset '{asset_cfg.name}'." | |
| ) |
There was a problem hiding this comment.
Done. Replaced assert with explicit raise ValueError with descriptive messages.
Add an experimental manager-based Cartpole environment using the warp manager infrastructure as a reference task for testing and benchmarking.
3cd41f3 to
224a902
Compare
|
@greptileai Review |
| assert self._env.rng_state_wp is not None | ||
|
|
||
| # iterate over all the interval terms (fixed list; captured graph-friendly) | ||
| for i, term_cfg in enumerate(self._mode_term_cfgs["interval"]): | ||
| lower, upper = self._interval_term_ranges[i] | ||
| if self._interval_term_is_global[i]: | ||
| assert self._interval_global_rng_state_wp is not None |
There was a problem hiding this comment.
assert in hot-path methods — inconsistent with fixes applied elsewhere in this PR
_apply_interval (and _apply_reset at line 370) still use assert for internal invariant checks:
assert self._env.rng_state_wp is not None # line 331
assert self._interval_global_rng_state_wp is not None # line 337
assert self._scratch_term_mask_wp is not None # line 370Python silently strips all assert statements when running with the -O (optimize) flag, which is common in production training scripts. These are the exact same patterns that were already fixed in this PR (in events.py and terminations.py) by replacing them with explicit raise. The inconsistency is easy to miss since these live in the manager implementation while the prior fixes were in MDP term functions.
The conditions themselves are always True under normal construction order (all fields are set in __init__ before _apply_* can be called), so these won't trigger in practice. However, if initialization order changes during future refactoring, the failure will be a cryptic TypeError/AttributeError deep inside a Warp kernel rather than a clear message at the assertion site.
For consistency with the rest of this PR, consider replacing with explicit RuntimeError raises — e.g.:
if self._env.rng_state_wp is None:
raise RuntimeError("EventManager._apply_interval: env.rng_state_wp is not initialized.")The same applies to line 455 (_prepare_terms), which is a one-time init path but is still an assert.
Summary
Cherry-pick of warp manager-based env infrastructure from
dev/newton, refactored fordevelop.isaaclab_experimentalActionManager,ObservationManager,EventManager,CommandManager,TerminationManager,RewardManager) with Warp kernel execution and CUDA graphcapture support.
ManagerCallSwitchutility for per-manager eager/captured dispatch, configured viaMANAGER_CALL_CONFIGenv var.ManagerBasedEnvWarpandManagerBasedRLEnvWarporchestration env classes.SceneEntityCfgwith warp joint mask/ids for kernel-level joint selection.ManagerBasefor automaticSceneEntityCfgresolution.isaaclab_tasks_experimentalIsaac-Cartpole-Warp-v0task as reference environment for warp manager-based workflow.isaaclab_rlManagerBasedRLEnvWarpandDirectRLEnvWarp.isaaclabSettingsManagerto catchRuntimeErrorwhen carb is unavailable.ObservationManager.Notes
develop-inhand-cp). Will rebase after Adds inhand manipulation warp env #4812 merges.dev/newton).ManagerCallSwitch(env-var config,single-call dispatch, cached capture results), removed stable-path duplication from call sites,
integrated timer into dispatch, and dropped stable-code modifications (
train.py,direct_rl_env.py,manager_base.py).Known limitations
Scene_write_data_to_simcapped to mode=1 (eager) viaMAX_MODE_OVERRIDES— articulation_apply_actuator_modeluseswp.to_torch + torch indexing, not CUDA graph capture-safe.Test plan
Isaac-Cartpole-Warp-v0training (4096 envs, 300 iters, mode=2): converges (reward 4.95, ep_len 300)