[Exp] Cherry-pick warp MDP migration and capture safety from dev/newton#4945
[Exp] Cherry-pick warp MDP migration and capture safety from dev/newton#4945hujc7 wants to merge 7 commits intoisaac-sim:developfrom
Conversation
f887335 to
0f63e21
Compare
|
@greptileai Review |
Greptile SummaryThis PR cherry-picks two Key changes:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Manager Stage Call] --> B{ManagerCallSwitch\nget_mode_for_manager}
B -->|STABLE mode=0| C[isaaclab.managers\nPyTorch path]
B -->|WARP_NOT_CAPTURED mode=1| D[isaaclab_experimental.managers\nWarp eager]
B -->|WARP_CAPTURED mode=2| E{Graph cached?}
E -->|No - first call| F[wp.ScopedCapture\ncapture graph + result]
E -->|Yes - replay| G[wp.capture_launch\ncached result]
F --> G
D --> H[MDP Term Functions\nobs / rewards / events\nterminations / actions]
G --> H
H --> I{Access data property}
I -->|Tier 1 sim-bind\nroot_link_pose_w\njoint_pos etc.| J[✅ Capture-safe\ndirect buffer]
I -->|Tier 2 lazy-derived\nprojected_gravity_b\nroot_com_pose_w etc.| K[capture_unsafe decorator\nRuntimeError if capturing]
J --> L[Warp Kernel Launch]
L --> M[WrenchComposer\nbody_link_pose_w + body_com_pos_b\ncomputed inline in kernel]
subgraph MAX_MODE_CAPS
N["Scene stages\ncapped at WARP_NOT_CAPTURED\n(articulation layer not capture-ready)"]
end
B -.-> MAX_MODE_CAPS
Last reviewed commit: 5713cea |
| raise TypeError(f"Unsupported mask type: {type(mask)}") | ||
|
|
||
| # --- Fast path: all elements --- | ||
| if ids is None or (isinstance(ids, slice) and ids == slice(None)): |
There was a problem hiding this comment.
Dead code in fast-path condition
After the normalization at lines 257–259 (if isinstance(ids, slice) and ids == slice(None): ids = None), ids can never be slice(None) by the time execution reaches line 285. The second conjunct (isinstance(ids, slice) and ids == slice(None)) is therefore unreachable dead code.
| if ids is None or (isinstance(ids, slice) and ids == slice(None)): | |
| if ids is None: | |
| return all_mask |
There was a problem hiding this comment.
Fixed — removed the entire function (dead code, no callers on this branch).
| complete = torch.zeros((N, *value.shape[1:]), dtype=torch.float32, device=device) | ||
| complete[ids] = value | ||
| return wp.from_torch(complete, dtype=dtype) |
There was a problem hiding this comment.
Hardcoded torch.float32 ignores caller-supplied dtype
The intermediate complete buffer is always created with dtype=torch.float32 (line 75) regardless of the dtype parameter. When wp.from_torch(complete, dtype=dtype) is called with a non-float32 Warp dtype (e.g. wp.float16 or wp.float64), Warp will reinterpret — not convert — the memory, silently producing garbage values. The same issue exists at line 123 in make_complete_data_from_torch_dual_index.
A minimal fix is to derive the torch dtype from the Warp dtype, or assert that dtype == wp.float32 when the allocation path is taken. For example:
torch_dtype = wp.dtype_to_torch(dtype)
complete = torch.zeros((N, *value.shape[1:]), dtype=torch_dtype, device=device)There was a problem hiding this comment.
Fixed — removed the entire function (dead code, no callers on this branch).
| with contextlib.suppress(ImportError): | ||
| from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedRLEnvWarp |
There was a problem hiding this comment.
contextlib.suppress inside TYPE_CHECKING leaves names undefined for type checkers
When isaaclab_experimental is not installed, the contextlib.suppress(ImportError) silently swallows the ImportError and leaves DirectRLEnvWarp and ManagerBasedRLEnvWarp as undefined names. Because the return type annotation of unwrapped (line 135) references these names unconditionally, type checkers (mypy, pyright) will emit a NameError / unresolved-reference error when running against an environment that does not have isaaclab_experimental installed.
The standard approach for optional dependencies under TYPE_CHECKING is to provide fallback type aliases:
if TYPE_CHECKING:
try:
from isaaclab_experimental.envs import DirectRLEnvWarp, ManagerBasedRLEnvWarp
except ImportError:
from typing import Any
DirectRLEnvWarp = Any # type: ignore[assignment,misc]
ManagerBasedRLEnvWarp = Any # type: ignore[assignment,misc]The same pattern applies to source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py (lines 55–56) and source/isaaclab_rl/isaaclab_rl/sb3.py (lines 36–37).
There was a problem hiding this comment.
Low priority — not fixing in this PR. The contextlib.suppress pattern is consistent with the other RL wrappers (rl_games, sb3, skrl).
74b80a2 to
760951f
Compare
|
@greptileai Review |
|
|
||
| # compute COM pose from link pose and body-frame COM offset | ||
| com_pose = _com_pose_from_link( | ||
| body_link_pose_w[env_ids[tid_env], body_ids[tid_body]], | ||
| body_com_pos_b[env_ids[tid_env], body_ids[tid_body]], | ||
| ) | ||
|
|
||
| # set the torques to the composed torque | ||
| if torques: | ||
| composed_torques_b[env_ids[tid_env], body_ids[tid_body]] = cast_torque_to_link_frame( | ||
| torques[tid_env, tid_body], | ||
| wp.transform_get_rotation(link_poses[env_ids[tid_env], body_ids[tid_body]]), | ||
| is_global, | ||
| composed_torques_b[env_ids[tid_env], body_ids[tid_body]] = cast_torque_to_com_frame( | ||
| torques[tid_env, tid_body], com_pose, is_global | ||
| ) | ||
| # set the forces to the composed force, if the positions are provided, adds a torque to the composed torque | ||
| # from the force at that position. | ||
| if forces: | ||
| # set the forces to the composed force | ||
| composed_forces_b[env_ids[tid_env], body_ids[tid_body]] = cast_force_to_link_frame( | ||
| forces[tid_env, tid_body], | ||
| wp.transform_get_rotation(link_poses[env_ids[tid_env], body_ids[tid_body]]), | ||
| is_global, | ||
| composed_forces_b[env_ids[tid_env], body_ids[tid_body]] = cast_force_to_com_frame( |
There was a problem hiding this comment.
Explicit torques silently overwritten when forces + positions both provided
In set_forces_and_torques_at_position_index (and the corresponding _mask variant), when a caller passes both torques and forces + positions, the torque buffer is first set from the explicit input, then immediately overwritten (not accumulated) by the position cross-product:
# step 1 — sets the torque
if torques:
composed_torques_b[...] = cast_torque_to_com_frame(torques[...], ...)
# step 2 — silently overwrites step 1
if forces:
if positions:
composed_torques_b[...] = wp.cross(...) # ← clobbers step 1The net result is that explicitly provided torques are dropped when forces-at-position are also specified. Presumably the combined torque should be τ_explicit + r × F. The same overwrite existed in the original wp.skew() @ … version (pre-existing), but the refactoring is a good opportunity to fix it before this kernel is widely adopted. If the "last writer wins" semantics are intentional, it should be documented prominently.
Suggested fix for the set_ variant:
if forces:
composed_forces_b[env_ids[tid_env], body_ids[tid_body]] = cast_force_to_com_frame(
forces[tid_env, tid_body], com_pose, is_global
)
if positions:
composed_torques_b[env_ids[tid_env], body_ids[tid_body]] += wp.cross(
cast_to_com_frame(positions[tid_env, tid_body], com_pose, is_global),
cast_force_to_com_frame(forces[tid_env, tid_body], com_pose, is_global),
)The same fix applies to set_forces_and_torques_at_position_mask.
There was a problem hiding this comment.
Pre-existing behavior — not introduced by this PR. Added a NOTE comment to both set_* variants documenting the overwrite caveat. No caller currently passes both torques and forces+positions.
| # world poses since nothing triggers the lazy recomputation. Instead, we keep the two | ||
| # Tier 1 inputs (body_link_pose_w and body_com_pos_b) and let the wrench kernels | ||
| # compute the COM pose inline. This is both correct in eager mode and CUDA-graph- | ||
| # capture safe (Tier 1 buffers are stable sim-bind pointers updated by the solver). | ||
| data = self._asset.data | ||
| if hasattr(data, "body_link_pose_w") and hasattr(data, "body_com_pos_b"): | ||
| self._body_link_pose_w = data.body_link_pose_w |
There was a problem hiding this comment.
Breaking change: stricter asset compatibility check
The guard was widened from hasattr(data, "body_link_pose_w") to hasattr(data, "body_link_pose_w") and hasattr(data, "body_com_pos_b"). Any custom or third-party asset class that exposes body_link_pose_w but not body_com_pos_b will now silently fall through to the else branch and raise ValueError("Unsupported asset type: …") instead of working as before.
Since body_com_pos_b is required by the kernels, this restriction is correct. However it is an undocumented breaking change from the caller's perspective. At minimum the ValueError message should mention the missing attribute:
| # world poses since nothing triggers the lazy recomputation. Instead, we keep the two | |
| # Tier 1 inputs (body_link_pose_w and body_com_pos_b) and let the wrench kernels | |
| # compute the COM pose inline. This is both correct in eager mode and CUDA-graph- | |
| # capture safe (Tier 1 buffers are stable sim-bind pointers updated by the solver). | |
| data = self._asset.data | |
| if hasattr(data, "body_link_pose_w") and hasattr(data, "body_com_pos_b"): | |
| self._body_link_pose_w = data.body_link_pose_w | |
| if hasattr(data, "body_link_pose_w") and hasattr(data, "body_com_pos_b"): | |
| self._body_link_pose_w = data.body_link_pose_w | |
| self._body_com_pos_b = data.body_com_pos_b | |
| elif hasattr(data, "body_link_pose_w"): | |
| raise ValueError( | |
| f"Asset type '{self._asset.__class__.__name__}' has 'body_link_pose_w' but is missing" | |
| " 'body_com_pos_b'. Both Tier-1 buffers are required by WrenchComposer." | |
| ) | |
| else: | |
| raise ValueError(f"Unsupported asset type: {self._asset.__class__.__name__}") |
There was a problem hiding this comment.
Intentional — both buffers needed for correct COM pose. Old code cached stale Tier 2 property. All IsaacLab asset types expose both.
AntoineRichard
left a comment
There was a problem hiding this comment.
It looks mostly safe to merge. Thanks a lot @hujc7 !
There was a problem hiding this comment.
Already removed in PR #4905 — imports from stable now.
There was a problem hiding this comment.
Not a duplicate — warp-specific decorator for func(env, out) -> None obs terms. Stable expects func(env) -> Tensor. Descriptor dataclass reused from stable.
There was a problem hiding this comment.
This is still outputing forces in the body frame? If yes all good :)
There was a problem hiding this comment.
Yes, body frame. The only change is computing COM pose inline from Tier 1 buffers (body_link_pose_w + body_com_pos_b) instead of caching the Tier 2 body_com_pose_w property. No change to output frame or kernel logic.
Add isaaclab_experimental package with DirectRLEnvWarp base class, InteractiveSceneWarp, and WarpGraphCache utility. Add direct warp environments in isaaclab_tasks_experimental: - Cartpole, Ant, Humanoid, Locomotion (base), InHand Manipulation, Allegro Hand, with agent configs for rsl_rl, rl_games, skrl, sb3. Adapt to develop base class API: - find_joints 2-value return (indices, names) - episode_length_buf as property with in-place copy for warp sync - _ALL_ENV_MASK on base env instead of articulation - set_joint_effort_target_mask for CUDA graph compatibility - _get_observations returns dict for rsl_rl wrapper Align solver configs with stable develop PresetCfg values. Add safe_normalize to guard against NaN from wp.normalize. Fix reset_root to_targets computation to use actual targets. Fix cartpole reset kernel to use parameterized joint indices. Clean up extension.toml and setup.py dependencies. Switch timer import to isaaclab.utils.timer.
Add experimental warp-compatible manager implementations, MDP terms, utilities (buffers, modifiers, noise, warp kernels), ManagerCallSwitch for eager/captured dispatch, and manager-based env orchestration. Includes RL library wrapper updates (rsl_rl, rl_games, sb3, skrl) to accept warp env types, and minor stable fixes (settings_manager RuntimeError handling, observation_manager comment cleanup).
71388dd to
8bdb13f
Compare
|
@greptileai Review |
| if not hasattr(push_by_setting_velocity, "_scratch_vel"): | ||
| push_by_setting_velocity._scratch_vel = wp.zeros((env.num_envs,), dtype=wp.spatial_vectorf, device=env.device) | ||
| r = [velocity_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z", "roll", "pitch", "yaw"]] | ||
| push_by_setting_velocity._lin_lo = wp.vec3f(r[0][0], r[1][0], r[2][0]) | ||
| push_by_setting_velocity._lin_hi = wp.vec3f(r[0][1], r[1][1], r[2][1]) | ||
| push_by_setting_velocity._ang_lo = wp.vec3f(r[3][0], r[4][0], r[5][0]) | ||
| push_by_setting_velocity._ang_hi = wp.vec3f(r[3][1], r[4][1], r[5][1]) | ||
|
|
||
| wp.launch( | ||
| kernel=_push_by_setting_velocity_kernel, | ||
| dim=env.num_envs, |
There was a problem hiding this comment.
Function-level scratch buffers unsafe with varying num_envs
Multiple event functions (push_by_setting_velocity, reset_root_state_uniform, reset_joints_by_offset, reset_joints_by_scale, apply_external_force_torque) use if not hasattr(func, "_scratch_...") to lazily allocate scratch buffers as function-level attributes on their first call. This pattern has two failure modes:
-
Larger second environment: If the function is first called with
num_envs=64(storing_scratch_velof shape(64,)), then called again in the same Python process withnum_envs=128,hasattrreturnsTrueso the old 64-element buffer is reused. The kernel launch usesdim=128, so threads withtid >= 64will read/write out-of-bounds on_scratch_vel. -
Test suite leakage: Because the attribute lives on the module-level function object, it persists across test cases. Subsequent tests with a different
num_envssilently inherit stale buffers.
A safe alternative is to key the cache on both the function name and env.num_envs, or store buffers on the env object:
cache_key = f"_scratch_vel_{env.num_envs}"
if not hasattr(push_by_setting_velocity, cache_key):
setattr(push_by_setting_velocity, cache_key,
wp.zeros((env.num_envs,), dtype=wp.spatial_vectorf, device=env.device))
scratch = getattr(push_by_setting_velocity, cache_key)The same issue appears at lines ~77 (randomize_rigid_body_com), ~151 (apply_external_force_torque), ~340 (reset_root_state_uniform), ~460 (reset_joints_by_offset), and ~530 (reset_joints_by_scale).
There was a problem hiding this comment.
Stale review — scratch buffer pattern was removed in latest push.
| def wrap_to_pi(angle: float) -> float: | ||
| """Wrap input angle (in radians) to the range [-pi, pi).""" | ||
| two_pi = 2.0 * wp.pi | ||
| wrapped_angle = angle + wp.pi | ||
| # NOTE: Use floor-based remainder semantics to match torch's `%` for negative inputs. | ||
| wrapped_angle = wrapped_angle - wp.floor(wrapped_angle / two_pi) * two_pi | ||
| return wp.where((wrapped_angle == 0) and (angle > 0), wp.pi, wrapped_angle - wp.pi) |
There was a problem hiding this comment.
wrap_to_pi returns +π for input +π, contradicting the docstring
The docstring documents the output range as [-π, π) (half-open, π excluded). However, the edge-case guard on line 144 explicitly returns wp.pi when wrapped_angle == 0 and angle > 0, which fires for any positive odd multiple of π (e.g. angle = π, 3π, …). Callers comparing output against the documented contract will see values of exactly +π in the output, which is outside [-π, π).
If the intent is to match PyTorch's atan2(sin(x), cos(x)) semantics for x = π (which yields ≈ π rather than -π due to sin(π) ≈ 1.2e-16), the docstring should say [-pi, pi] instead:
| def wrap_to_pi(angle: float) -> float: | |
| """Wrap input angle (in radians) to the range [-pi, pi).""" | |
| two_pi = 2.0 * wp.pi | |
| wrapped_angle = angle + wp.pi | |
| # NOTE: Use floor-based remainder semantics to match torch's `%` for negative inputs. | |
| wrapped_angle = wrapped_angle - wp.floor(wrapped_angle / two_pi) * two_pi | |
| return wp.where((wrapped_angle == 0) and (angle > 0), wp.pi, wrapped_angle - wp.pi) | |
| """Wrap input angle (in radians) to the range [-pi, pi].""" |
Alternatively, if strict [-π, π) is required (e.g. to match torch.remainder-based wrapping), the edge-case guard should return -wp.pi instead of wp.pi.
There was a problem hiding this comment.
Docstring was incorrect. Fixed to [-pi, pi] to match stable isaaclab.utils.math.wrap_to_pi convention. Fix is in PR #4829 where the function is introduced.
| assert asset_cfg.joint_mask is not None | ||
| assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0] |
There was a problem hiding this comment.
assert is stripped by python -O (optimized mode)
assert statements are silently removed when Python is run with the -O / -OO optimisation flag (e.g. many CI runners and production launchers set PYTHONOPTIMIZE). If asset_cfg.joint_mask is None or the shapes mismatch, the kernel launch will raise an opaque Warp error rather than the descriptive message an explicit check would provide.
| assert asset_cfg.joint_mask is not None | |
| assert asset.data.joint_pos.shape[1] == asset_cfg.joint_mask.shape[0] | |
| if asset_cfg.joint_mask is None: | |
| raise RuntimeError( | |
| f"joint_pos_out_of_manual_limit: 'asset_cfg.joint_mask' is None. " | |
| "Call 'asset_cfg.resolve_for_warp(scene)' before invoking this term." | |
| ) | |
| if asset.data.joint_pos.shape[1] != asset_cfg.joint_mask.shape[0]: | |
| raise RuntimeError( | |
| f"joint_pos_out_of_manual_limit: joint_pos dim 1 ({asset.data.joint_pos.shape[1]}) " | |
| f"!= joint_mask size ({asset_cfg.joint_mask.shape[0]})." | |
| ) |
There was a problem hiding this comment.
Replaced with explicit raise ValueError in PR #4829 where these asserts are introduced.
Add an experimental manager-based Cartpole environment using the warp manager infrastructure as a reference task for testing and benchmarking.
8bdb13f to
5713cea
Compare
|
@greptileai Review |
| i, j = wp.tid() | ||
| if joint_mask[j]: | ||
| v = joint_pos[i, j] | ||
| if v < lower or v > upper: | ||
| out[i] = True |
There was a problem hiding this comment.
Data race: concurrent non-atomic writes in 2D kernel
_joint_pos_out_of_manual_limit_kernel is launched with dim=(num_envs, num_joints), so multiple threads sharing the same i (env_id) but with different j (joint_id) can all execute out[i] = True concurrently. Per the CUDA programming model, concurrent non-atomic writes to the same address from different threads are undefined behaviour even when all writers store the same value.
In practice CUDA hardware serialises same-address stores within a warp and the value ends up correct, but this is not guaranteed across warps or thread-blocks. The safe alternatives are:
- Restructure as a 1D kernel (env_id) with an inner loop over joints (mirrors how
_sum_sq_masked_kernelinrewards.pyis written):
@wp.kernel
def _joint_pos_out_of_manual_limit_kernel(
joint_pos: wp.array(dtype=wp.float32, ndim=2),
joint_mask: wp.array(dtype=wp.bool),
lower: float,
upper: float,
out: wp.array(dtype=wp.bool),
):
i = wp.tid()
for j in range(joint_pos.shape[1]):
if joint_mask[j]:
v = joint_pos[i, j]
if v < lower or v > upper:
out[i] = True
return- Keep the 2D layout but replace the plain store with a Warp atomic (e.g. a cast-to-int
wp.atomic_or).
Add warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs. Update manager infrastructure with ManagerCallSwitch max_mode cap, Scene capture config, body_ids_wp resolution, and capture safety annotations. Add newton state kernels for body-frame computations used by MDP terms.
Fix WrenchComposer to use Tier 1 sim-bind buffers (body_link_pose_w, body_com_pos_b) instead of caching the lazy Tier 2 body_com_pose_w property, which became stale after the first step. Add capture_unsafe decorator for lazy-evaluated derived properties in articulation and rigid object data. Update wrench kernels to compute COM pose inline from link pose and body-frame COM offset.
Add manager-based warp env configs for classic (Ant, Humanoid), locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), and manipulation reach (Franka, UR10). Include task-specific MDP terms for humanoid observations/rewards and locomotion rewards/terminations/ curriculums.
Add parity test infrastructure and tests organized by MDP category: observations, rewards, terminations, events, and actions. Tests verify warp-first implementations match stable torch baselines across three execution modes (stable, warp uncaptured, warp CUDA-graph captured). Extract shared fixtures and mock objects to parity_helpers.py.
5713cea to
4c6c546
Compare
Summary
developdevelopChanges included
ManagerCallSwitchmax_mode cap and scene capture configcapture_unsafesafety guards on lazy-evaluated derived properties in articulation/rigid_object dataDependencies
Must be merged after these PRs (in order):
Test plan
test_mdp_warp_parity.pyandtest_mdp_warp_parity_new_terms.pytest_action_warp_parity.py