Skip to content

[Exp] Cherry-pick warp MDP migration and capture safety from dev/newton#4945

Open
hujc7 wants to merge 7 commits intoisaac-sim:developfrom
hujc7:develop-manager-more-envs
Open

[Exp] Cherry-pick warp MDP migration and capture safety from dev/newton#4945
hujc7 wants to merge 7 commits intoisaac-sim:developfrom
hujc7:develop-manager-more-envs

Conversation

@hujc7
Copy link

@hujc7 hujc7 commented Mar 11, 2026

Summary

Changes included

  • Warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs
  • Tested warp env configs: Ant, Humanoid, Cartpole, locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), Franka/UR10 reach
  • ManagerCallSwitch max_mode cap and scene capture config
  • MDP kernels made graph-capturable with consolidated test infrastructure
  • capture_unsafe safety guards on lazy-evaluated derived properties in articulation/rigid_object data
  • WrenchComposer fix: use fresh COM pose buffers instead of stale cached link poses

Dependencies

Must be merged after these PRs (in order):

  1. [Exp] Cherry-pick direct warp envs from dev/newton #4905
  2. [Exp] Cherry-pick manager-based warp env infrastructure from dev/newton #4829

Test plan

  • Run warp env training sweep across all new manager-based env configs
  • Run test_mdp_warp_parity.py and test_mdp_warp_parity_new_terms.py
  • Run test_action_warp_parity.py
  • Verify WrenchComposer COM pose is fresh (not stale) during graph replay

@github-actions github-actions bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Mar 11, 2026
@hujc7 hujc7 force-pushed the develop-manager-more-envs branch from f887335 to 0f63e21 Compare March 12, 2026 07:32
@hujc7
Copy link
Author

hujc7 commented Mar 12, 2026

@greptileai Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR cherry-picks two dev/newton branches onto develop: Warp-first MDP migration (observations, rewards, events, terminations, actions for manager-based envs) and CUDA-graph capture safety guards. It introduces the isaaclab_experimental package with Warp-first manager implementations, a ManagerCallSwitch routing layer, and capture_unsafe decorators on lazily-evaluated data properties.

Key changes:

  • WrenchComposer fix: replaces the stale cached body_com_pose_w (Tier 2 lazy property) with direct references to the Tier 1 sim-bind buffers body_link_pose_w + body_com_pos_b; COM pose is now computed inline in Warp kernels, making it correct both in eager mode and during CUDA graph replay.
  • capture_unsafe decorator: guards all lazy TimestampedBuffer-backed properties on ArticulationData and RigidObjectData, raising RuntimeError if accessed during graph capture. This prevents silent stale-value bugs.
  • ManagerCallSwitch: routes manager stage calls through STABLE / WARP_NOT_CAPTURED / WARP_CAPTURED modes with configurable per-manager caps (e.g. Scene capped at WARP_NOT_CAPTURED).
  • Warp-first MDP terms: new observations.py, rewards.py, events.py, terminations.py, and joint_actions.py all use Tier 1 buffers and inline kernel computations.
  • One logic concern: _joint_pos_out_of_manual_limit_kernel in terminations.py is a 2D kernel where multiple threads with the same env_id write True to out[i] without atomic operations — technically a CUDA data race (though benign in practice because all writers store the same value). Restructuring as a 1D kernel with an inner joint loop (consistent with how rewards.py kernels are written) would eliminate this.
  • Minor docstring discrepancy: ManagerCallSwitch.call_stage documents timer as defaulting to True while the actual default is False.

Confidence Score: 4/5

  • This PR is safe to merge — it is experimental-package code with no breaking changes to stable Isaac Lab APIs, and the one identified logic concern (non-atomic writes in a 2D termination kernel) is benign in current CUDA hardware practice.
  • The core correctness fixes (WrenchComposer stale-COM fix, capture_unsafe guards, COM-frame kernel migration) are well-reasoned and verified against the Newton data layer. The Warp-first MDP terms follow consistent patterns and are backed by parity tests. Score is 4 rather than 5 because _joint_pos_out_of_manual_limit_kernel has a formal CUDA data race that, while harmless today, should be fixed before this pattern is reused in other kernels.
  • Pay close attention to source/isaaclab_experimental/isaaclab_experimental/envs/mdp/terminations.py (2D kernel race condition) and source/isaaclab_experimental/isaaclab_experimental/utils/manager_call_switch.py (docstring / DEFAULT_KEY style issues).

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/utils/wrench_composer.py Refactored to hold Tier 1 sim-bind buffers (body_link_pose_w, body_com_pos_b) directly instead of caching the lazily-evaluated body_com_pose_w. Also restructures reset() to handle env_mask, env_ids, and the all-reset fast-path in the correct priority order. Removes the stale _link_poses_updated flag. Changes look correct; the fix eliminates graph-replay use of stale COM poses.
source/isaaclab/isaaclab/utils/warp/kernels.py Kernel helpers migrated from link-frame to COM-frame: helper _com_pose_from_link added; cast_to_link_frame / cast_force_to_link_frame / cast_torque_to_link_frame replaced by COM-frame equivalents. All four add/set_forces_and_torques_at_position_* kernels now accept body_link_pose_w + body_com_pos_b and compute COM pose inline. Pre-existing torque-overwrite caveat documented via NOTE comments.
source/isaaclab/isaaclab/utils/warp/utils.py New file adding resolve_1d_mask (allocation-free ids/mask → wp.bool mask) and the capture_unsafe decorator that raises RuntimeError when a guarded callable is invoked during CUDA graph capture. Both utilities are well-designed; the capture guard correctly raises only when wp.get_device().is_capturing is True.
source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation_data.py Adds @capture_unsafe guards to all lazily-computed Tier 2 derived properties (root_link_vel_w, root_com_pose_w, body_link_vel_w, body_com_pose_w, projected_gravity_b, heading_w, root_link_lin_vel_b). This prevents silent correctness bugs during CUDA graph capture where the Python timestamp guard is invisible to the replay engine.
source/isaaclab_newton/isaaclab_newton/kernels/state_kernels.py New file with three @wp.func helpers (rotate_vec_to_body_frame, body_lin_vel_from_root, body_ang_vel_from_root). Conventions are consistent with Newton's [linear; angular] spatial-vector layout (spatial_top = linear, spatial_bottom = angular). Clean and correct.
source/isaaclab_experimental/isaaclab_experimental/utils/manager_call_switch.py New ManagerCallSwitch class routing manager stage calls through STABLE / WARP_NOT_CAPTURED / WARP_CAPTURED modes with per-manager max-mode caps baked at init time. Two style issues: the call_stage docstring incorrectly documents timer as defaulting to True (actual default: False), and get_mode_for_manager uses next(iter(self.DEFAULT_CONFIG)) instead of the DEFAULT_KEY class constant.
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/terminations.py New Warp-first termination terms. _joint_pos_out_of_manual_limit_kernel uses a 2D kernel launch and writes out[i] = True from multiple concurrent threads (same env_id, different joint_id) without atomic operations — a write-write race condition per the CUDA memory model. In practice safe because all writers store the same value, but should be restructured as a 1D kernel with an inner joint loop for correctness.
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/events.py New Warp-first event terms (COM randomization, external force/torque, root/joint reset). Function-level scratch buffer allocation pattern (hasattr(fn, "_scratch_...")) is retained — matches stable MDP conventions. COM randomization adds offsets to the current buffer, matching the stable implementation's documented "one-time use" behaviour.
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/observations.py New Warp-first observation terms. Correctly uses Tier 1 buffers (root_link_pose_w, root_com_vel_w) and the new state_kernels helpers to inline COM-frame projections, avoiding lazy Tier 2 properties that are not capture-safe. Design is clean and parity-tested.
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/rewards.py New Warp-first reward terms. 1D kernels with inner joint loops (correctly avoiding 2D race conditions seen in terminations). Correctly reads Tier 1 buffers and uses wp.atomic_add in the reward manager for per-term episode sum accumulation.
source/isaaclab_experimental/isaaclab_experimental/managers/observation_manager.py New Warp-first observation manager. Implements out-buffer pre-allocation, history via CircularBuffer, clip/scale via Warp kernels, and noise/modifier support. Large file (~940 lines) with substantial complexity; no clear logic bugs found.
source/isaaclab_experimental/isaaclab_experimental/managers/reward_manager.py New Warp-first reward manager. Uses stride-based views into a contiguous (num_terms, num_envs) buffer for per-term outputs, with a single _reward_finalize kernel for weighted accumulation. wp.atomic_add used correctly for episode sum logging. Step reward convention (rate raw*weight, not val/dt) matches stable semantics per comments.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Manager Stage Call] --> B{ManagerCallSwitch\nget_mode_for_manager}
    B -->|STABLE mode=0| C[isaaclab.managers\nPyTorch path]
    B -->|WARP_NOT_CAPTURED mode=1| D[isaaclab_experimental.managers\nWarp eager]
    B -->|WARP_CAPTURED mode=2| E{Graph cached?}
    E -->|No - first call| F[wp.ScopedCapture\ncapture graph + result]
    E -->|Yes - replay| G[wp.capture_launch\ncached result]
    F --> G

    D --> H[MDP Term Functions\nobs / rewards / events\nterminations / actions]
    G --> H

    H --> I{Access data property}
    I -->|Tier 1 sim-bind\nroot_link_pose_w\njoint_pos etc.| J[✅ Capture-safe\ndirect buffer]
    I -->|Tier 2 lazy-derived\nprojected_gravity_b\nroot_com_pose_w etc.| K[capture_unsafe decorator\nRuntimeError if capturing]

    J --> L[Warp Kernel Launch]
    L --> M[WrenchComposer\nbody_link_pose_w + body_com_pos_b\ncomputed inline in kernel]

    subgraph MAX_MODE_CAPS
        N["Scene stages\ncapped at WARP_NOT_CAPTURED\n(articulation layer not capture-ready)"]
    end
    B -.-> MAX_MODE_CAPS
Loading

Last reviewed commit: 5713cea

raise TypeError(f"Unsupported mask type: {type(mask)}")

# --- Fast path: all elements ---
if ids is None or (isinstance(ids, slice) and ids == slice(None)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code in fast-path condition

After the normalization at lines 257–259 (if isinstance(ids, slice) and ids == slice(None): ids = None), ids can never be slice(None) by the time execution reaches line 285. The second conjunct (isinstance(ids, slice) and ids == slice(None)) is therefore unreachable dead code.

Suggested change
if ids is None or (isinstance(ids, slice) and ids == slice(None)):
if ids is None:
return all_mask

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the entire function (dead code, no callers on this branch).

Comment on lines +75 to +77
complete = torch.zeros((N, *value.shape[1:]), dtype=torch.float32, device=device)
complete[ids] = value
return wp.from_torch(complete, dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded torch.float32 ignores caller-supplied dtype

The intermediate complete buffer is always created with dtype=torch.float32 (line 75) regardless of the dtype parameter. When wp.from_torch(complete, dtype=dtype) is called with a non-float32 Warp dtype (e.g. wp.float16 or wp.float64), Warp will reinterpret — not convert — the memory, silently producing garbage values. The same issue exists at line 123 in make_complete_data_from_torch_dual_index.

A minimal fix is to derive the torch dtype from the Warp dtype, or assert that dtype == wp.float32 when the allocation path is taken. For example:

torch_dtype = wp.dtype_to_torch(dtype)
complete = torch.zeros((N, *value.shape[1:]), dtype=torch_dtype, device=device)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the entire function (dead code, no callers on this branch).

Comment on lines +21 to +22
with contextlib.suppress(ImportError):
from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedRLEnvWarp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contextlib.suppress inside TYPE_CHECKING leaves names undefined for type checkers

When isaaclab_experimental is not installed, the contextlib.suppress(ImportError) silently swallows the ImportError and leaves DirectRLEnvWarp and ManagerBasedRLEnvWarp as undefined names. Because the return type annotation of unwrapped (line 135) references these names unconditionally, type checkers (mypy, pyright) will emit a NameError / unresolved-reference error when running against an environment that does not have isaaclab_experimental installed.

The standard approach for optional dependencies under TYPE_CHECKING is to provide fallback type aliases:

if TYPE_CHECKING:
    try:
        from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedRLEnvWarp
    except ImportError:
        from typing import Any
        DirectRLEnvWarp = Any  # type: ignore[assignment,misc]
        ManagerBasedRLEnvWarp = Any  # type: ignore[assignment,misc]

The same pattern applies to source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py (lines 55–56) and source/isaaclab_rl/isaaclab_rl/sb3.py (lines 36–37).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — not fixing in this PR. The contextlib.suppress pattern is consistent with the other RL wrappers (rl_games, sb3, skrl).

@hujc7 hujc7 force-pushed the develop-manager-more-envs branch 2 times, most recently from 74b80a2 to 760951f Compare March 12, 2026 08:35
@hujc7
Copy link
Author

hujc7 commented Mar 12, 2026

@greptileai Review

Comment on lines 487 to +503

# compute COM pose from link pose and body-frame COM offset
com_pose = _com_pose_from_link(
body_link_pose_w[env_ids[tid_env], body_ids[tid_body]],
body_com_pos_b[env_ids[tid_env], body_ids[tid_body]],
)

# set the torques to the composed torque
if torques:
composed_torques_b[env_ids[tid_env], body_ids[tid_body]] = cast_torque_to_link_frame(
torques[tid_env, tid_body],
wp.transform_get_rotation(link_poses[env_ids[tid_env], body_ids[tid_body]]),
is_global,
composed_torques_b[env_ids[tid_env], body_ids[tid_body]] = cast_torque_to_com_frame(
torques[tid_env, tid_body], com_pose, is_global
)
# set the forces to the composed force, if the positions are provided, adds a torque to the composed torque
# from the force at that position.
if forces:
# set the forces to the composed force
composed_forces_b[env_ids[tid_env], body_ids[tid_body]] = cast_force_to_link_frame(
forces[tid_env, tid_body],
wp.transform_get_rotation(link_poses[env_ids[tid_env], body_ids[tid_body]]),
is_global,
composed_forces_b[env_ids[tid_env], body_ids[tid_body]] = cast_force_to_com_frame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit torques silently overwritten when forces + positions both provided

In set_forces_and_torques_at_position_index (and the corresponding _mask variant), when a caller passes both torques and forces + positions, the torque buffer is first set from the explicit input, then immediately overwritten (not accumulated) by the position cross-product:

# step 1 — sets the torque
if torques:
    composed_torques_b[...] = cast_torque_to_com_frame(torques[...], ...)

# step 2 — silently overwrites step 1
if forces:
    if positions:
        composed_torques_b[...] = wp.cross(...)   # ← clobbers step 1

The net result is that explicitly provided torques are dropped when forces-at-position are also specified. Presumably the combined torque should be τ_explicit + r × F. The same overwrite existed in the original wp.skew() @ … version (pre-existing), but the refactoring is a good opportunity to fix it before this kernel is widely adopted. If the "last writer wins" semantics are intentional, it should be documented prominently.

Suggested fix for the set_ variant:

if forces:
    composed_forces_b[env_ids[tid_env], body_ids[tid_body]] = cast_force_to_com_frame(
        forces[tid_env, tid_body], com_pose, is_global
    )
    if positions:
        composed_torques_b[env_ids[tid_env], body_ids[tid_body]] += wp.cross(
            cast_to_com_frame(positions[tid_env, tid_body], com_pose, is_global),
            cast_force_to_com_frame(forces[tid_env, tid_body], com_pose, is_global),
        )

The same fix applies to set_forces_and_torques_at_position_mask.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing behavior — not introduced by this PR. Added a NOTE comment to both set_* variants documenting the overwrite caveat. No caller currently passes both torques and forces+positions.

Comment on lines +52 to +58
# world poses since nothing triggers the lazy recomputation. Instead, we keep the two
# Tier 1 inputs (body_link_pose_w and body_com_pos_b) and let the wrench kernels
# compute the COM pose inline. This is both correct in eager mode and CUDA-graph-
# capture safe (Tier 1 buffers are stable sim-bind pointers updated by the solver).
data = self._asset.data
if hasattr(data, "body_link_pose_w") and hasattr(data, "body_com_pos_b"):
self._body_link_pose_w = data.body_link_pose_w
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change: stricter asset compatibility check

The guard was widened from hasattr(data, "body_link_pose_w") to hasattr(data, "body_link_pose_w") and hasattr(data, "body_com_pos_b"). Any custom or third-party asset class that exposes body_link_pose_w but not body_com_pos_b will now silently fall through to the else branch and raise ValueError("Unsupported asset type: …") instead of working as before.

Since body_com_pos_b is required by the kernels, this restriction is correct. However it is an undocumented breaking change from the caller's perspective. At minimum the ValueError message should mention the missing attribute:

Suggested change
# world poses since nothing triggers the lazy recomputation. Instead, we keep the two
# Tier 1 inputs (body_link_pose_w and body_com_pos_b) and let the wrench kernels
# compute the COM pose inline. This is both correct in eager mode and CUDA-graph-
# capture safe (Tier 1 buffers are stable sim-bind pointers updated by the solver).
data = self._asset.data
if hasattr(data, "body_link_pose_w") and hasattr(data, "body_com_pos_b"):
self._body_link_pose_w = data.body_link_pose_w
if hasattr(data, "body_link_pose_w") and hasattr(data, "body_com_pos_b"):
self._body_link_pose_w = data.body_link_pose_w
self._body_com_pos_b = data.body_com_pos_b
elif hasattr(data, "body_link_pose_w"):
raise ValueError(
f"Asset type '{self._asset.__class__.__name__}' has 'body_link_pose_w' but is missing"
" 'body_com_pos_b'. Both Tier-1 buffers are required by WrenchComposer."
)
else:
raise ValueError(f"Unsupported asset type: {self._asset.__class__.__name__}")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — both buffers needed for correct COM pose. Old code cached stale Tier 2 property. All IsaacLab asset types expose both.

Copy link
Collaborator

@AntoineRichard AntoineRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks mostly safe to merge. Thanks a lot @hujc7 !

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove? :D

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already removed in PR #4905 — imports from stable now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove? :D

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a duplicate — warp-specific decorator for func(env, out) -> None obs terms. Stable expects func(env) -> Tensor. Descriptor dataclass reused from stable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still outputing forces in the body frame? If yes all good :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, body frame. The only change is computing COM pose inline from Tier 1 buffers (body_link_pose_w + body_com_pos_b) instead of caching the Tier 2 body_com_pose_w property. No change to output frame or kernel logic.

hujc7 added 2 commits March 12, 2026 16:27
Add isaaclab_experimental package with DirectRLEnvWarp base class,
InteractiveSceneWarp, and WarpGraphCache utility.

Add direct warp environments in isaaclab_tasks_experimental:
- Cartpole, Ant, Humanoid, Locomotion (base), InHand Manipulation,
  Allegro Hand, with agent configs for rsl_rl, rl_games, skrl, sb3.

Adapt to develop base class API:
- find_joints 2-value return (indices, names)
- episode_length_buf as property with in-place copy for warp sync
- _ALL_ENV_MASK on base env instead of articulation
- set_joint_effort_target_mask for CUDA graph compatibility
- _get_observations returns dict for rsl_rl wrapper

Align solver configs with stable develop PresetCfg values.
Add safe_normalize to guard against NaN from wp.normalize.
Fix reset_root to_targets computation to use actual targets.
Fix cartpole reset kernel to use parameterized joint indices.
Clean up extension.toml and setup.py dependencies.
Switch timer import to isaaclab.utils.timer.
Add experimental warp-compatible manager implementations, MDP terms,
utilities (buffers, modifiers, noise, warp kernels), ManagerCallSwitch
for eager/captured dispatch, and manager-based env orchestration.

Includes RL library wrapper updates (rsl_rl, rl_games, sb3, skrl) to
accept warp env types, and minor stable fixes (settings_manager
RuntimeError handling, observation_manager comment cleanup).
@hujc7 hujc7 force-pushed the develop-manager-more-envs branch 2 times, most recently from 71388dd to 8bdb13f Compare March 12, 2026 23:45
@hujc7
Copy link
Author

hujc7 commented Mar 12, 2026

@greptileai Review

Comment on lines +230 to +240
if not hasattr(push_by_setting_velocity, "_scratch_vel"):
push_by_setting_velocity._scratch_vel = wp.zeros((env.num_envs,), dtype=wp.spatial_vectorf, device=env.device)
r = [velocity_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z", "roll", "pitch", "yaw"]]
push_by_setting_velocity._lin_lo = wp.vec3f(r[0][0], r[1][0], r[2][0])
push_by_setting_velocity._lin_hi = wp.vec3f(r[0][1], r[1][1], r[2][1])
push_by_setting_velocity._ang_lo = wp.vec3f(r[3][0], r[4][0], r[5][0])
push_by_setting_velocity._ang_hi = wp.vec3f(r[3][1], r[4][1], r[5][1])

wp.launch(
kernel=_push_by_setting_velocity_kernel,
dim=env.num_envs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function-level scratch buffers unsafe with varying num_envs

Multiple event functions (push_by_setting_velocity, reset_root_state_uniform, reset_joints_by_offset, reset_joints_by_scale, apply_external_force_torque) use if not hasattr(func, "_scratch_...") to lazily allocate scratch buffers as function-level attributes on their first call. This pattern has two failure modes:

  1. Larger second environment: If the function is first called with num_envs=64 (storing _scratch_vel of shape (64,)), then called again in the same Python process with num_envs=128, hasattr returns True so the old 64-element buffer is reused. The kernel launch uses dim=128, so threads with tid >= 64 will read/write out-of-bounds on _scratch_vel.

  2. Test suite leakage: Because the attribute lives on the module-level function object, it persists across test cases. Subsequent tests with a different num_envs silently inherit stale buffers.

A safe alternative is to key the cache on both the function name and env.num_envs, or store buffers on the env object:

cache_key = f"_scratch_vel_{env.num_envs}"
if not hasattr(push_by_setting_velocity, cache_key):
    setattr(push_by_setting_velocity, cache_key,
            wp.zeros((env.num_envs,), dtype=wp.spatial_vectorf, device=env.device))
scratch = getattr(push_by_setting_velocity, cache_key)

The same issue appears at lines ~77 (randomize_rigid_body_com), ~151 (apply_external_force_torque), ~340 (reset_root_state_uniform), ~460 (reset_joints_by_offset), and ~530 (reset_joints_by_scale).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale review — scratch buffer pattern was removed in latest push.

Comment on lines +138 to +144
def wrap_to_pi(angle: float) -> float:
"""Wrap input angle (in radians) to the range [-pi, pi)."""
two_pi = 2.0 * wp.pi
wrapped_angle = angle + wp.pi
# NOTE: Use floor-based remainder semantics to match torch's `%` for negative inputs.
wrapped_angle = wrapped_angle - wp.floor(wrapped_angle / two_pi) * two_pi
return wp.where((wrapped_angle == 0) and (angle > 0), wp.pi, wrapped_angle - wp.pi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap_to_pi returns for input , contradicting the docstring

The docstring documents the output range as [-π, π) (half-open, π excluded). However, the edge-case guard on line 144 explicitly returns wp.pi when wrapped_angle == 0 and angle > 0, which fires for any positive odd multiple of π (e.g. angle = π, , …). Callers comparing output against the documented contract will see values of exactly in the output, which is outside [-π, π).

If the intent is to match PyTorch's atan2(sin(x), cos(x)) semantics for x = π (which yields ≈ π rather than due to sin(π) ≈ 1.2e-16), the docstring should say [-pi, pi] instead:

Suggested change
def wrap_to_pi(angle: float) -> float:
"""Wrap input angle (in radians) to the range [-pi, pi)."""
two_pi = 2.0 * wp.pi
wrapped_angle = angle + wp.pi
# NOTE: Use floor-based remainder semantics to match torch's `%` for negative inputs.
wrapped_angle = wrapped_angle - wp.floor(wrapped_angle / two_pi) * two_pi
return wp.where((wrapped_angle == 0) and (angle > 0), wp.pi, wrapped_angle - wp.pi)
"""Wrap input angle (in radians) to the range [-pi, pi]."""

Alternatively, if strict [-π, π) is required (e.g. to match torch.remainder-based wrapping), the edge-case guard should return -wp.pi instead of wp.pi.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring was incorrect. Fixed to [-pi, pi] to match stable isaaclab.utils.math.wrap_to_pi convention. Fix is in PR #4829 where the function is introduced.

Comment on lines +107 to +108
assert asset_cfg.joint_mask is not None
assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert is stripped by python -O (optimized mode)

assert statements are silently removed when Python is run with the -O / -OO optimisation flag (e.g. many CI runners and production launchers set PYTHONOPTIMIZE). If asset_cfg.joint_mask is None or the shapes mismatch, the kernel launch will raise an opaque Warp error rather than the descriptive message an explicit check would provide.

Suggested change
assert asset_cfg.joint_mask is not None
assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0]
if asset_cfg.joint_mask is None:
raise RuntimeError(
f"joint_pos_out_of_manual_limit: 'asset_cfg.joint_mask' is None. "
"Call 'asset_cfg.resolve_for_warp(scene)' before invoking this term."
)
if asset.data.joint_pos.shape[1] != asset_cfg.joint_mask.shape[0]:
raise RuntimeError(
f"joint_pos_out_of_manual_limit: joint_pos dim 1 ({asset.data.joint_pos.shape[1]}) "
f"!= joint_mask size ({asset_cfg.joint_mask.shape[0]})."
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with explicit raise ValueError in PR #4829 where these asserts are introduced.

Add an experimental manager-based Cartpole environment using the warp
manager infrastructure as a reference task for testing and benchmarking.
@hujc7 hujc7 force-pushed the develop-manager-more-envs branch from 8bdb13f to 5713cea Compare March 13, 2026 07:44
@hujc7
Copy link
Author

hujc7 commented Mar 13, 2026

@greptileai Review

Comment on lines +95 to +99
i, j = wp.tid()
if joint_mask[j]:
v = joint_pos[i, j]
if v < lower or v > upper:
out[i] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data race: concurrent non-atomic writes in 2D kernel

_joint_pos_out_of_manual_limit_kernel is launched with dim=(num_envs, num_joints), so multiple threads sharing the same i (env_id) but with different j (joint_id) can all execute out[i] = True concurrently. Per the CUDA programming model, concurrent non-atomic writes to the same address from different threads are undefined behaviour even when all writers store the same value.

In practice CUDA hardware serialises same-address stores within a warp and the value ends up correct, but this is not guaranteed across warps or thread-blocks. The safe alternatives are:

  1. Restructure as a 1D kernel (env_id) with an inner loop over joints (mirrors how _sum_sq_masked_kernel in rewards.py is written):
@wp.kernel
def _joint_pos_out_of_manual_limit_kernel(
    joint_pos: wp.array(dtype=wp.float32, ndim=2),
    joint_mask: wp.array(dtype=wp.bool),
    lower: float,
    upper: float,
    out: wp.array(dtype=wp.bool),
):
    i = wp.tid()
    for j in range(joint_pos.shape[1]):
        if joint_mask[j]:
            v = joint_pos[i, j]
            if v < lower or v > upper:
                out[i] = True
                return
  1. Keep the 2D layout but replace the plain store with a Warp atomic (e.g. a cast-to-int wp.atomic_or).

hujc7 added 4 commits March 13, 2026 03:14
Add warp-first MDP terms (observations, rewards, events, terminations,
actions) for manager-based envs. Update manager infrastructure with
ManagerCallSwitch max_mode cap, Scene capture config, body_ids_wp
resolution, and capture safety annotations.

Add newton state kernels for body-frame computations used by MDP terms.
Fix WrenchComposer to use Tier 1 sim-bind buffers (body_link_pose_w,
body_com_pos_b) instead of caching the lazy Tier 2 body_com_pose_w
property, which became stale after the first step.

Add capture_unsafe decorator for lazy-evaluated derived properties in
articulation and rigid object data. Update wrench kernels to compute
COM pose inline from link pose and body-frame COM offset.
Add manager-based warp env configs for classic (Ant, Humanoid),
locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), and
manipulation reach (Franka, UR10). Include task-specific MDP terms
for humanoid observations/rewards and locomotion rewards/terminations/
curriculums.
Add parity test infrastructure and tests organized by MDP category:
observations, rewards, terminations, events, and actions. Tests verify
warp-first implementations match stable torch baselines across three
execution modes (stable, warp uncaptured, warp CUDA-graph captured).

Extract shared fixtures and mock objects to parity_helpers.py.
@hujc7 hujc7 force-pushed the develop-manager-more-envs branch from 5713cea to 4c6c546 Compare March 13, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants