Disables USD notice handlers during usd_replicate for 25-35% faster startup#5070
Disables USD notice handlers during usd_replicate for 25-35% faster startup#5070aloysbaillet wants to merge 1 commit intoisaac-sim:developfrom
Conversation
Greptile SummaryThis PR delivers a meaningful startup-time optimization (25–35% reduction for large environments) by mirroring what IsaacSim's own Key changes:
Issues found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant usd_replicate
participant _disable_usd_notice_handlers
participant SimulationManager
participant UsdUtils_StageCache
participant Sdf
Caller->>usd_replicate: usd_replicate(stage, sources, destinations, ...)
usd_replicate->>_disable_usd_notice_handlers: _disable_usd_notice_handlers(stage)
_disable_usd_notice_handlers->>UsdUtils_StageCache: Insert(stage) → stage_id
_disable_usd_notice_handlers->>SimulationManager: is_fabric_usd_notice_handler_enabled(stage_id)
SimulationManager-->>_disable_usd_notice_handlers: fabric_was_enabled
_disable_usd_notice_handlers->>SimulationManager: enable_fabric_usd_notice_handler(stage_id, False)
_disable_usd_notice_handlers->>SimulationManager: enable_usd_notice_handler(False)
_disable_usd_notice_handlers-->>usd_replicate: state dict
rect rgb(200, 230, 200)
Note over usd_replicate,Sdf: try block — bulk USD authoring with handlers suppressed
usd_replicate->>Sdf: ChangeBlock (single block wrapping all depths)
loop for each depth × source × env_id
usd_replicate->>Sdf: CreatePrimInLayer(rl, dp)
usd_replicate->>Sdf: CopySpec(src → dp)
usd_replicate->>Sdf: author xformOp:translate / xformOp:orient
end
Sdf-->>usd_replicate: ChangeBlock closes → single batch notify
end
rect rgb(255, 220, 220)
Note over usd_replicate,SimulationManager: finally block — always restore handlers
usd_replicate->>SimulationManager: enable_fabric_usd_notice_handler(stage_id, True) [if was_enabled]
usd_replicate->>SimulationManager: enable_usd_notice_handler(True) [unconditional ⚠️]
end
usd_replicate-->>Caller: return
Last reviewed commit: "perf: disable USD no..." |
| state["fabric_notice_was_enabled"] = SimulationManager.is_fabric_usd_notice_handler_enabled(stage_id) | ||
| if state["fabric_notice_was_enabled"]: | ||
| SimulationManager.enable_fabric_usd_notice_handler(stage_id, False) | ||
| SimulationManager.enable_usd_notice_handler(False) |
There was a problem hiding this comment.
USD notice handler state not saved before disabling
SimulationManager.enable_usd_notice_handler(False) is called unconditionally without first checking whether it was already enabled. In _enable_usd_notice_handlers, it is then always restored to True — regardless of what its state was before usd_replicate() was called.
Contrast this with the fabric notice handler, where the initial state is checked with is_fabric_usd_notice_handler_enabled() and only restored if it was enabled. If any outer code (e.g. another nested cloning operation or a user-managed disable context) had already set the USD notice handler to False, calling usd_replicate() would silently re-enable it in the finally block — a side effect the caller would not expect.
The fix is to save and conditionally restore the USD notice handler state, mirroring the fabric handler pattern:
try:
from isaacsim.core.simulation_manager.impl.simulation_manager import SimulationManager
state["fabric_notice_was_enabled"] = SimulationManager.is_fabric_usd_notice_handler_enabled(stage_id)
if state["fabric_notice_was_enabled"]:
SimulationManager.enable_fabric_usd_notice_handler(stage_id, False)
# Save and conditionally disable the USD notice handler as well
state["usd_notice_was_enabled"] = SimulationManager.is_usd_notice_handler_enabled()
if state["usd_notice_was_enabled"]:
SimulationManager.enable_usd_notice_handler(False)
state["sim_manager_available"] = TrueAnd in _enable_usd_notice_handlers:
if state.get("usd_notice_was_enabled", False):
SimulationManager.enable_usd_notice_handler(True)(This assumes SimulationManager.is_usd_notice_handler_enabled() exists; if not, at minimum the existing asymmetry relative to the fabric handler should be documented.)
| stage_id = UsdUtils.StageCache.Get().Insert(stage).ToLongInt() | ||
| state["stage_id"] = stage_id |
There was a problem hiding this comment.
Stage inserted into global cache without cleanup
UsdUtils.StageCache.Get().Insert(stage) inserts the stage into the global, process-wide USD stage cache. If the stage was not already registered in the cache, this call permanently adds it — and there is no corresponding removal, either in _disable_usd_notice_handlers or in the finally block of usd_replicate().
In Isaac Sim's normal runtime the active stage is typically already cached (Isaac Sim registers it at startup), so this is usually a no-op insertion. However, in unit tests or lightweight scenarios where the stage is created independently, calling usd_replicate() would silently inject it into the global cache and it would remain there for the lifetime of the process, preventing the stage from being garbage-collected.
Consider checking whether the stage was already in the cache and only storing the ID without re-insertion, or — if an insertion was needed — removing it in the finally path:
cache = UsdUtils.StageCache.Get()
cache_id = cache.GetId(stage)
if not cache_id.IsValid():
cache_id = cache.Insert(stage)
state["stage_inserted_by_us"] = True
else:
state["stage_inserted_by_us"] = False
state["stage_id"] = cache_id.ToLongInt()
state["cache_id_obj"] = cache_idThen in _enable_usd_notice_handlers, conditionally erase:
if state.get("stage_inserted_by_us", False):
UsdUtils.StageCache.Get().Erase(state["cache_id_obj"])| SimulationManager.enable_fabric_usd_notice_handler(stage_id, False) | ||
| SimulationManager.enable_usd_notice_handler(False) | ||
| state["sim_manager_available"] = True | ||
| except (ImportError, Exception) as e: |
There was a problem hiding this comment.
Redundant exception type in
except clause
ImportError is a subclass of Exception, so except (ImportError, Exception) is equivalent to except Exception. Listing ImportError explicitly adds no additional behavior and may give the false impression that it is handled differently from other exceptions.
| except (ImportError, Exception) as e: | |
| except Exception as e: |
| if state.get("fabric_notice_was_enabled", False): | ||
| SimulationManager.enable_fabric_usd_notice_handler(state["stage_id"], True) | ||
| SimulationManager.enable_usd_notice_handler(True) | ||
| except (ImportError, Exception) as e: |
| with Sdf.ChangeBlock(): | ||
| for i in depth_to_indices[depth]: | ||
| src = sources[i] | ||
| tmpl = destinations[i] | ||
| # Select target environments for this source (supports None, [E], or [S, E]) | ||
| target_envs = env_ids if mask is None else env_ids[mask[i]] | ||
| for wid in target_envs.tolist(): | ||
| dp = tmpl.format(wid) | ||
| Sdf.CreatePrimInLayer(rl, dp) | ||
| if src == dp: | ||
| pass # self-copy: CreatePrimInLayer already ensures it exists; CopySpec would be destructive | ||
| else: | ||
| Sdf.CopySpec(rl, Sdf.Path(src), rl, Sdf.Path(dp)) | ||
|
|
||
| if positions is not None or quaternions is not None: | ||
| ps = rl.GetPrimAtPath(dp) | ||
| op_names = [] | ||
| if positions is not None: | ||
| p = positions[wid] | ||
| t_attr = ps.GetAttributeAtPath(dp + ".xformOp:translate") | ||
| if t_attr is None: | ||
| t_attr = Sdf.AttributeSpec(ps, "xformOp:translate", Sdf.ValueTypeNames.Double3) | ||
| t_attr.default = Gf.Vec3d(float(p[0]), float(p[1]), float(p[2])) | ||
| op_names.append("xformOp:translate") | ||
| if quaternions is not None: | ||
| q = quaternions[wid] | ||
| o_attr = ps.GetAttributeAtPath(dp + ".xformOp:orient") | ||
| if o_attr is None: | ||
| o_attr = Sdf.AttributeSpec(ps, "xformOp:orient", Sdf.ValueTypeNames.Quatd) | ||
| # xyzw convention: q[3] is w, q[0:3] is xyz | ||
| o_attr.default = Gf.Quatd(float(q[3]), Gf.Vec3d(float(q[0]), float(q[1]), float(q[2]))) | ||
| op_names.append("xformOp:orient") | ||
| # Only author xformOpOrder for the ops we actually authored | ||
| if op_names: | ||
| op_order = ps.GetAttributeAtPath(dp + ".xformOpOrder") or Sdf.AttributeSpec( | ||
| ps, UsdGeom.Tokens.xformOpOrder, Sdf.ValueTypeNames.TokenArray | ||
| ) | ||
| op_order.default = Vt.TokenArray(op_names) | ||
| for depth in sorted(depth_to_indices.keys()): |
There was a problem hiding this comment.
Single
Sdf.ChangeBlock changes prior per-depth composition semantics
The original code wrapped each depth level in its own Sdf.ChangeBlock(), which caused USD to flush pending change notices and re-compose the stage at the end of each depth. The accompanying comment explicitly noted this was done "to stabilize composition" for nested or interdependent specs.
The new code wraps all depths in one Sdf.ChangeBlock, deferring all notifications (and USD composition) until every Sdf.CopySpec across every depth has completed. For pure SDF layer writes this should be safe because Sdf.CopySpec operates on spec data rather than the composed stage. However, if any path deeper in the hierarchy is defined via composition arcs that reference an ancestor spec authored at a shallower depth in the same call, the composed view of those arcs won't be available until after the outer ChangeBlock closes.
This is likely fine for the tested robot/environment assets, but it is a semantic regression from the original design. If this change is intentional (as an additional performance improvement complementing the notice-handler suppression), please add a comment explaining why per-depth flushing is no longer needed, so future maintainers don't revert it without understanding the trade-off.
| @functools.cache | ||
| def string_to_callable(name: str) -> Callable: |
There was a problem hiding this comment.
@functools.cache is unbounded and never cleared
functools.cache is an alias for lru_cache(maxsize=None), which means every unique string ever passed to string_to_callable is retained for the process lifetime along with a strong reference to the resolved callable (and, transitively, its module). In long-running Isaac Sim training sessions with many distinct callable strings this cache will grow without bound and prevent modules from being garbage-collected.
Two additional edge-cases to be aware of:
- Module reloading: if a downstream user calls
importlib.reload(some_module), cached entries will silently continue returning the pre-reload callable. - Lambda expressions:
eval(name)creates a new function object on each call, but with caching only one instance is ever created. This is harmless for pure lambdas, but could be surprising if someone expected independent closure objects.
Consider using @functools.lru_cache(maxsize=512) to cap memory usage, or document the unbounded nature clearly so maintainers know to call string_to_callable.cache_clear() if needed.
7950929 to
3818cf8
Compare
Disable IFabricUsd and SimulationManager notice handlers during usd_replicate to avoid per-Sdf.CopySpec Fabric sync overhead. Critically, do NOT re-enable the handlers after cloning — the downstream SimulationContext.reset() handles the Fabric sync naturally, and re-enabling triggers an expensive batch resync. Also caches string_to_callable() with @functools.cache and fixes redundant triple ast.parse() in is_lambda_expression(). Key insight: The batch Fabric resync on re-enable is MORE expensive than letting SimulationContext.reset handle sync as part of its normal flow. Measured improvements (4096/8192 envs, 1 GPU, warm run): Isaac-Cartpole-Direct-v0: 8.9s -> 7.4s (-17%) Isaac-Reach-Franka-v0: 29.0s -> 9.6s (-67%) Isaac-Velocity-Rough-H1-v0: 37.9s -> 25.3s (-33%) Isaac-Ant-Direct-v0: 11.9s -> 8.4s (-29%) Isaac-Velocity-Flat-Anymal-D: 29.8s -> 16.2s (-46%) Isaac-Repose-Cube-Shadow-Direct: 46.6s -> 27.0s (-42%) Average improvement: -39% No FPS regressions. All tasks pass. 18/18 cloner tests pass.
3818cf8 to
9e9d7e4
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
Performance optimization that disables USD notice handlers (IFabricUsd + SimulationManager) during bulk Sdf.CopySpec cloning in usd_replicate(). The core idea is sound — this mirrors what isaacsim.core.cloner.Cloner.disable_change_listener() already does, and Isaac Lab's usd_replicate() was missing this optimization. The Sdf.ChangeBlock consolidation from per-depth to single-block is also correct since Sdf.CreatePrimInLayer creates ancestor specs immediately within the layer. The is_lambda_expression triple-parse fix and string_to_callable caching are clean, low-risk wins.
Architecture Impact
Low-medium. The change intentionally leaves notice handlers permanently disabled after cloning, relying on downstream SimulationContext.reset() to re-sync Fabric. This is valid for the standard startup flow, but creates a hidden contract: any code that calls usd_replicate() outside the normal init→reset pipeline (e.g., runtime re-cloning, tooling scripts) will silently operate with disabled handlers and no Fabric sync.
Implementation Verdict
Approve with nits. The optimization is correct and well-motivated by Nsight profiling data. There are several cleanups needed (dead code, overly broad exception handling, unused refactoring, stale comment), detailed in 5 inline comments below. Nothing that blocks merging.
Test Coverage
No new tests added. Relies on existing cloner tests (18/18 per commit message) and end-to-end benchmarks. Acceptable for a perf optimization.
CI Status
| state["sim_manager_available"] = True | ||
| except (ImportError, Exception) as e: | ||
| logger.debug("Could not disable notice handlers: %s", e) | ||
| state["sim_manager_available"] = False |
There was a problem hiding this comment.
except (ImportError, Exception) is a bare except Exception in disguise.
ImportError is a subclass of Exception, so listing both is redundant. More importantly, catching all Exception silently swallows real bugs — e.g., TypeError from a wrong arg count to enable_fabric_usd_notice_handler, or AttributeError from API changes in newer IsaacSim versions. Narrow this to the exceptions you actually expect:
| state["sim_manager_available"] = False | |
| except (ImportError, AttributeError, RuntimeError) as e: |
Same issue in _enable_usd_notice_handlers at position 62.
|
|
||
| def _enable_usd_notice_handlers(state: dict) -> None: | ||
| """Re-enable USD notice handlers after bulk scene authoring. | ||
|
|
There was a problem hiding this comment.
_enable_usd_notice_handlers is dead code — never called anywhere in this PR.
The commit message and inline comments explicitly state handlers are intentionally left disabled ("do NOT re-enable"). This function appears only at its definition and a docstring cross-reference. The state dict returned by _disable_usd_notice_handlers is also vestigial since no caller uses it for restoration.
Recommend either:
- Remove it (preferred — YAGNI, reduces maintenance burden), or
- If kept as future-proofing, add
# NOTE: currently unused; kept for callers needing explicit re-enable
| cfg: TemplateCloneCfg = template_clone_cfg | ||
| _clone_from_template_impl(stage, num_clones, cfg) | ||
|
|
||
|
|
There was a problem hiding this comment.
clone_from_template → _clone_from_template_impl split adds no value.
The wrapper just assigns cfg = template_clone_cfg and delegates. The original code did this inline. This looks like scaffolding for wrapping handler management around the impl (disable before, pass _manage_notice_handlers=False to inner calls), but that wrapping was never added — and the commit message says re-enabling is intentionally avoided.
Suggestion: either collapse back into a single function, or wire up the intended optimization:
def clone_from_template(stage, num_clones, template_clone_cfg):
cfg = template_clone_cfg
_disable_usd_notice_handlers(stage) # once for all inner calls
_clone_from_template_impl(stage, num_clones, cfg) # inner calls use _manage_notice_handlers=FalseThis would avoid 2-3 redundant SimulationManager round-trips per startup.
| quaternions: torch.Tensor | None = None, | ||
| _manage_notice_handlers: bool = True, | ||
| ) -> None: | ||
| """Replicate USD prims to per-environment destinations. |
There was a problem hiding this comment.
_manage_notice_handlers parameter is unused API surface.
No call site passes _manage_notice_handlers=False. The three usd_replicate() calls inside _clone_from_template_impl (lines 124, 138, 148) and the two in InteractiveScene (lines 176, 243) all use the default True, meaning handlers are redundantly disabled on every call.
This is functionally harmless (the is_fabric_usd_notice_handler_enabled guard makes repeated disabling a no-op) but the parameter + docstring create a misleading API contract — especially the phrase "disable/re-enable" when handlers are never re-enabled.
Recommend either:
- Remove the parameter entirely (current effective behavior: always disable), or
- Wire it up: have
clone_from_templatedisable once and passFalseto innerusd_replicatecalls
| with Sdf.ChangeBlock(): | ||
| # Wrap all depths in a single ChangeBlock so notifications fire once at the end. | ||
| with Sdf.ChangeBlock(): | ||
| for depth in sorted(depth_to_indices.keys()): |
There was a problem hiding this comment.
Stale comment on the preceding context line ("committing at each depth to stabilize composition") should be updated.
The comment at line 250 (in the file) says "Process in layers of equal depth, committing at each depth to stabilize composition" — but the per-depth ChangeBlock commit is gone. The new single-block approach is correct since Sdf.CreatePrimInLayer creates ancestor specs immediately at the Sdf layer level, so depth ordering is maintained without per-depth commits. Suggest updating that context-line comment to:
# Process in layers of equal depth (parents before children) within a single ChangeBlock.
| state["stage_id"] = stage_id | ||
|
|
||
| try: | ||
| from isaacsim.core.simulation_manager.impl.simulation_manager import SimulationManager |
There was a problem hiding this comment.
where does the SimulationManager from isaac sim get triggered from? I think we've removed the reference to it in isaac lab. or is this just something that we want to overwrite in kit settings directly?
There was a problem hiding this comment.
Good question. The SimulationManager here isn't being used the way Isaac Lab references it in simulation_context.py (where it's just for set_physics_sim_device / get_physics_sim_device).
In this PR, the import is specifically for SimulationManager's USD notice handler control — enable_fabric_usd_notice_handler() and enable_usd_notice_handler(). These are static methods on isaacsim.core.simulation_manager.impl.simulation_manager.SimulationManager that control Kit-level USD notice listeners. The SimulationManager registers itself as a Tf.Notice listener on the USD stage during Isaac Sim startup (inside the isaacsim.core.simulation_manager Kit extension). When any USD change happens, it triggers Fabric sync via IFabricUsd — which is exactly the per-change overhead this PR is trying to avoid during bulk cloning.
So to answer your question: it's triggered at the Kit/Omniverse extension level, not from Isaac Lab. Isaac Lab doesn't register or manage these notice handlers — it's Isaac Sim infrastructure. This PR is reaching into Isaac Sim's internals (same as what isaacsim.core.cloner.Cloner.disable_change_listener() does) to temporarily suppress those listeners during bulk Sdf.CopySpec operations.
You're right that Isaac Lab itself doesn't own this — the question is whether it's better to:
- Call into Isaac Sim's
SimulationManagerdirectly (as this PR does), or - Use Kit settings /
carb.settingsto disable the handlers (e.g., if there's a settings-based toggle)
Option 1 mirrors what Isaac Sim's own Cloner does, so it's consistent. But if there's a cleaner Kit settings path that achieves the same thing, that could be less coupled.
Description
Summary
Disable IFabricUsd and SimulationManager USD notice handlers during bulk environment cloning in usd_replicate(), reducing startup time by 25-35% for manager-based tasks.
Problem
When usd_replicate() calls Sdf.CopySpec to clone thousands of environment prims, every single copy triggers IFabricUsd UsdNoticeListener::Handle(UsdNotice::ObjectsChanged) — an expensive Fabric↔USD sync operation. For Reach-Franka (4096 envs), this added 17 seconds (438 notice handler calls) during scene construction.
IsaacSim's own Cloner.disable_change_listener() already handles this by disabling the notice handler before cloning, but IsaacLab v3's replacement usd_replicate() function didn't replicate this behavior.
Additionally, the import path from isaacsim.core.simulation_manager import SimulationManager resolves to PhysxManager under IsaacLab's lazy_export module system, which lacks the enable_fabric_usd_notice_handler method. The correct import is from isaacsim.core.simulation_manager.impl.simulation_manager import SimulationManager.
Changes
source/isaaclab/isaaclab/cloner/cloner_utils.py:
• Add _disable_usd_notice_handlers() / _enable_usd_notice_handlers() helpers that mirror isaacsim.core.cloner.Cloner.disable_change_listener()
• Wrap usd_replicate() body in try/finally to disable handlers during bulk USD authoring
• Use correct full import path for SimulationManager
source/isaaclab/isaaclab/utils/string.py:
• Cache string_to_callable() with @functools.cache to avoid repeated importlib.import_module calls
• Fix triple ast.parse() call in is_lambda_expression()
Benchmark Results
Tested on DGXC (2x NVIDIA L40, 4096 envs, 1 GPU, warm run):
Root Cause Analysis
Profiling with Nsight Systems showed IFabricUsd UsdNoticeListener::Handle consuming:
• 17.0s for Reach-Franka (438 calls)
• 9.9s for H1-Rough
• 22.9s for Dexsuite-Kuka
These notice handler calls happen during usd_replicate when Sdf.CopySpec modifies the USD stage. By disabling the handlers (as IsaacSim's own Cloner does), the Fabric sync is deferred to a single batch at the end.
Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there