Skip to content

Comments

[WIP][Newton] Migrate manager based workflow to warp#4480

Open
hujc7 wants to merge 5 commits intoisaac-sim:dev/newtonfrom
hujc7:dev-newton-warp-mig-manager-based
Open

[WIP][Newton] Migrate manager based workflow to warp#4480
hujc7 wants to merge 5 commits intoisaac-sim:dev/newtonfrom
hujc7:dev-newton-warp-mig-manager-based

Conversation

@hujc7
Copy link

@hujc7 hujc7 commented Jan 28, 2026

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Jan 28, 2026
@hujc7 hujc7 changed the title [Newton] Migrate manager based workflow to warp [WIP][Newton] Migrate manager based workflow to warp Jan 28, 2026
@hujc7 hujc7 force-pushed the dev-newton-warp-mig-manager-based branch from d4796f8 to ef0e0f6 Compare January 28, 2026 10:32
@hujc7 hujc7 force-pushed the dev-newton-warp-mig-manager-based branch from 247d950 to 108a707 Compare February 10, 2026 09:23
@hujc7
Copy link
Author

hujc7 commented Feb 10, 2026

P1: /home/jichuanh/workspaces/isaac/data/manager-based/cp-torch-more-300.txt
P2: /home/jichuanh/workspaces/isaac/data/manager-based/cpv-G-ESRT.txt

Training result (last step)

iteration P1: 299/300; iteration P2: 299/300

metric P1 P2 % change
Computation 302717 steps/s 266494 steps/s -11.97%
Mean action noise std 0.06 0.09 +50.00%
Mean value_function loss 0.0000 0.0004
Mean surrogate loss 0.0062 0.0045 -27.42%
Mean entropy loss -1.4284 -1.0168 -28.82%
Mean reward 4.94 4.93 -0.20%
Mean episode length 300.00 300.00 +0.00%
Episode_Reward/alive 1.0000 1.0000 +0.00%
Episode_Reward/terminating 0.0000 0.0000
Episode_Reward/pole_pos -0.0084 -0.0108 +28.57%
Episode_Reward/cart_vel -0.0035 -0.0042 +20.00%
Episode_Reward/pole_vel -0.0011 -0.0014 +27.27%
Episode_Termination/time_out 0.9993 0.9976 -0.17%
Episode_Termination/cart_out_of_bounds 0.0007 0.0024 +242.86%
Total timesteps 19660800 19660800 +0.00%
Iteration time 0.22s 0.25s +13.64%
Time elapsed 00:01:23 00:00:50 -39.76%

Step

gap P1: -4.72% (Σsub 7941.26 us/step vs timed 8334.90 us/step); gap P2: -13.46% (Σsub 1937.77 us/step vs timed 2239.05 us/step)

operation P1 time/call (us) P2 time/call (us) % change (per-step)
Step (1) 8334.90 2239.05 -73.14%
Newton simulation step (2) 448.57 447.09 -0.33%
Scene.update (2) 160.93 175.88 +9.29%
Reset idx (1) 2993.70 342.93 -88.54%
Action preprocessing (1) 20.17 104.55 +418.40%
Reset selection (1) 61.61 92.86 +50.72%
ActionManager.apply_action (2) 424.85 31.63 -92.56%
Termination+Reward (1) 1314.45 39.25 -97.01%
ActionManager.process_action (1) 112.61 21.52 -80.89%
ObservationManager.compute (update_history) (1) 199.23 18.35 -90.79%
CommandManager.compute (1) 8.91 9.11 +2.21%
Scene.write_data_to_sim (2) 580.94

Reset idx

gap P1: -7.69% (Σsub 2763.60 us/step vs timed 2993.70 us/step); gap P2: -61.19% (Σsub 133.11 us/step vs timed 342.93 us/step)

operation P1 time/call (us) P2 time/call (us) % change (per-step)
Reset idx (1) 2993.70 342.93 -88.54%
Action+Reward reset (1.0002) 471.28 26.06 -94.47%
EventManager.apply (reset) (1.0002) 1890.76 21.40 -98.87%
TerminationManager.reset (1.0002) 89.98 21.23 -76.40%
Scene.reset (1.0002) 252.66 17.34 -93.14%
ObservationManager.reset (1.0002) 10.92 9.42 -13.71%
CurriculumManager.compute (reset) (1.0002) 9.46 9.21 -2.63%
EventManager.reset (1.0002) 14.34 7.35 -48.72%
CurriculumManager.reset (1.0002) 9.08 7.28 -19.81%
CommandManager.reset (1.0002) 7.06 7.01 -0.70%
RecorderManager.reset (1.0002) 7.49 6.77 -9.58%

@hujc7
Copy link
Author

hujc7 commented Feb 11, 2026

As timer itself has overhead and it also does a wp.sync, Data with only step timed

P1: /home/jichuanh/workspaces/isaac/data/manager-based/cpv-torch-baseline-less.txt
P2: /home/jichuanh/workspaces/isaac/data/manager-based/cpv-G-baseline-reset-step-off.txt

Training result (last step)

iteration P1: 299/300
iteration P2: 299/300

metric P1 P2 % change
Computation 344628 steps/s 227807 steps/s -33.90%
Mean action noise std 0.06 0.09 +50.00%
Mean value_function loss 0.0000 0.0004
Mean surrogate loss 0.0062 0.0045 -27.42%
Mean entropy loss -1.4284 -1.0168 -28.82%
Mean reward 4.94 4.93 -0.20%
Mean episode length 300.00 300.00 +0.00%
Episode_Reward/alive 1.0000 1.0000 +0.00%
Episode_Reward/terminating 0.0000 0.0000
Episode_Reward/pole_pos -0.0084 -0.0108 +28.57%
Episode_Reward/cart_vel -0.0035 -0.0042 +20.00%
Episode_Reward/pole_vel -0.0011 -0.0014 +27.27%
Episode_Termination/time_out 0.9993 0.9976 -0.17%
Episode_Termination/cart_out_of_bounds 0.0007 0.0024 +242.86%
Total timesteps 19660800 19660800 +0.00%
Iteration time 0.19s 0.29s +52.63%
Time elapsed 00:01:14 00:00:53 -28.38%

Step

gap P1: -100.00% (Σsub 0.00 us/step vs timed 6974.04 us/step)
gap P2: -100.00% (Σsub 0.00 us/step vs timed 1431.46 us/step)

operation P1 time/call (us) P2 time/call (us) % change (per-step)
Step (1) 6974.04 1431.46 -79.47%

@hujc7
Copy link
Author

hujc7 commented Feb 13, 2026

covergence issue is fixed. However, with event manger migrated, there's still a slower to converge problem between mean episode length of 100-200, which is likely due to rng usage. Torch impl uses a single seed to sample events, but warp version has per env rng state, causing slightly different distribution over short period.

Below is the comparison of torch run vs captured warp run.
P1: data/regression-test/torch-baseline.txt
P2: data/regression-test/2026-02-13_02-32-13_single-manager-warp-sweep_Isaac-Cartpole-Warp-v0_ne4096_it300_rep1/00_default_only_2_r01.txt

Training result (last step)

iteration P1: 299/300
iteration P2: 299/300

metric P1 P2 % change
Computation 319793 steps/s 591688 steps/s +85.02%
Mean action noise std 0.07 0.09 +28.57%
Mean value_function loss 0.0000 0.0000
Mean surrogate loss 0.0060 0.0058 -3.33%
Mean entropy loss -1.1987 -1.0135 -15.45%
Mean reward 4.93 4.94 +0.20%
Mean episode length 300.00 300.00 +0.00%
Episode_Reward/alive 1.0000 1.0000 +0.00%
Episode_Reward/terminating 0.0000 0.0000
Episode_Reward/pole_pos -0.0089 -0.0099 +11.24%
Episode_Reward/cart_vel -0.0037 -0.0037 -0.00%
Episode_Reward/pole_vel -0.0012 -0.0013 +8.33%
Episode_Termination/time_out 0.9976 0.9983 +0.07%
Episode_Termination/cart_out_of_bounds 0.0024 0.0017 -29.17%
Total timesteps 19660800 19660800 +0.00%
Iteration time 0.20s 0.11s -45.00%
Time elapsed 00:01:19 00:00:51 -35.44%

Step

gap P1: -100.00% (Σsub 0.00 us/step vs timed 8006.61 us/step)
gap P2: -100.00% (Σsub 0.00 us/step vs timed 1502.66 us/step)

operation P1 time/call (us) P2 time/call (us) % change (per-step)
Step (1) 8006.61 1502.66 -81.23%

Training convergence

threshold P1 first iter P2 first iter Δiter (P2-P1)
50 3 3 +0
100 6 17 +11
150 22 30 +8
200 27 35 +8
250 37 38 +1
300 48 49 +1

@hujc7 hujc7 marked this pull request as ready for review February 18, 2026 06:06
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 18, 2026

Greptile Summary

This PR migrates the manager-based RL workflow to use Warp (NVIDIA's Python framework for GPU-accelerated computing) for improved performance through CUDA graph capture. The implementation adds an experimental fork of the stable environment under isaaclab_experimental with ~8,100 new lines of code across 52 files.

Key Changes

  • New Warp-first environment infrastructure: ManagerBasedEnvWarp and ManagerBasedRLEnvWarp provide experimental base classes with CUDA graph capture support
  • Manager call switching: ManagerCallSwitch allows runtime selection between stable (mode 0), warp-uncaptured (mode 1), and warp-captured (mode 2) execution for each manager independently
  • Warp-native managers: All managers (Action, Observation, Reward, Termination, Command, Event) reimplemented with Warp kernels and boolean mask-based APIs
  • MDP terms: New Warp-compatible observation, reward, termination, and event terms with in-place output writes
  • Cartpole task: Reference implementation demonstrating the Warp workflow with Newton physics backend
  • Utility infrastructure: Circular buffers for observation history, Warp noise models, and RNG state management

Architecture

The workflow uses persistent Warp buffers with stable pointers to enable CUDA graph capture. Manager terms write results in-place to pre-allocated buffers rather than returning values. Environment mask resolution uses shared scratch buffers (ALL_ENV_MASK, ENV_MASK) to avoid allocations during captured execution.

Previous Review Comments

Several issues were identified in earlier review threads and are noted as known concerns:

  • resolve_env_mask handling of Python lists (manager_based_env_warp.py:487)
  • Always-on timer decorators in both stable and experimental envs
  • Duplicate sim field in CartpoleEnvCfg (appears resolved in latest commit a247e5d)
  • RNG state increment pattern in command_manager.py differs from other kernels
  • SceneEntityCfg.resolve() assumes Articulation type
  • Shared scratch mask re-entrancy warning in articulation_data.py

These are acknowledged limitations of the WIP implementation and do not block experimental development.

Confidence Score: 3/5

  • This is a work-in-progress experimental migration with known limitations suitable for experimental branch development
  • This PR introduces a large experimental codebase (~8K lines) with several known issues already documented in previous review threads. The implementation is architecturally sound for experimental development but has correctness concerns (env mask handling, RNG state patterns, scratch buffer re-entrancy) that prevent production use. The WIP status and experimental placement are appropriate - this enables isolated development without affecting stable workflows.
  • Pay close attention to manager_based_env_warp.py (env mask resolution), command_manager.py (RNG advancement), and articulation_data.py (shared scratch buffers) if extending beyond Cartpole

Important Files Changed

Filename Overview
source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_rl_env_warp.py Experimental RL env with Warp integration, manager call switching, and CUDA graph capture support. Has timer enabled by default (line 197).
source/isaaclab_experimental/isaaclab_experimental/envs/manager_based_env_warp.py Base experimental env with ManagerCallSwitch for stable/warp/captured execution modes. Handles env mask resolution and scene setup.
source/isaaclab_experimental/isaaclab_experimental/managers/reward_manager.py Warp-native reward manager with per-term output buffers and CUDA graph support. Stores weighted reward rates (not step rewards) in _step_reward_wp.
source/isaaclab_experimental/isaaclab_experimental/managers/observation_manager.py Warp-first observation manager with circular buffer history support, per-term noise/modifiers, and in-place observation computation.
source/isaaclab_experimental/isaaclab_experimental/managers/action_manager.py Action manager with Warp-compatible action terms, supports process_actions and apply_actions with mask-based reset.
source/isaaclab_experimental/isaaclab_experimental/managers/command_manager.py Command manager with Warp-based resampling. RNG state write-back on line 76 may skip RNG outputs (already flagged).
source/isaaclab_experimental/isaaclab_experimental/managers/termination_manager.py Warp termination manager with boolean masks for dones/terminated/time_outs and episode logging support.
source/isaaclab_experimental/isaaclab_experimental/managers/event_manager.py Event manager with interval/reset/startup modes and Warp mask-based event application with RNG state management.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/manager_based/classic/cartpole/cartpole_env_cfg.py Cartpole task config with Newton physics, Warp-first MDP terms. Previously flagged duplicate sim field removed in latest commit.
source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation_data.py Newton articulation data with Warp state buffers, mask resolvers, and timestamped buffer management.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ManagerBasedRLEnvWarp] --> B[ManagerCallSwitch]
    A --> C[Scene w/ Articulation]
    A --> D[ActionManager]
    A --> E[ObservationManager]
    A --> F[RewardManager]
    A --> G[TerminationManager]
    A --> H[CommandManager]
    A --> I[EventManager]
    
    B -->|Mode 0| J[Stable Managers]
    B -->|Mode 1| K[Warp Not Captured]
    B -->|Mode 2| L[Warp CUDA Graph]
    
    D --> M[JointEffortAction]
    M --> N[Articulation.set_joint_effort_target]
    
    E --> O[CircularBuffer History]
    E --> P[Noise/Modifiers]
    
    F --> Q[Per-term Warp Buffers]
    F --> R[_reward_finalize Kernel]
    
    G --> S[Boolean Masks]
    G --> T[Episode Logging]
    
    H --> U[RNG State Management]
    
    I --> V[reset/interval/startup]
    I --> W[reset_joints_by_offset]
    
    N --> C
    W --> C
    
    style L fill:#90EE90
    style J fill:#FFB6C1
    style K fill:#FFE4B5
Loading

Last reviewed commit: a247e5d

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

51 files reviewed, 9 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 487 to 490
# ids provided as python sequence (already normalized to list above)
if len(env_ids) == 0:
return self.ENV_MASK
raise ValueError(f"Unsupported env_ids type: {type(env_ids)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_env_mask fails for non-empty Python lists

When env_ids is a non-empty Python list (e.g. [0, 2, 5]), the code reaches line 490 and unconditionally raises ValueError. The empty-list check on line 488 is handled, but non-empty lists are never converted to a Warp array and fed to the kernel. This means any caller that passes a plain Python list of env ids (a common pattern) will crash.

You likely need to convert the Python list to a wp.array and launch the kernel, similar to the ArticulationData._resolve_1d_mask implementation:

Suggested change
# ids provided as python sequence (already normalized to list above)
if len(env_ids) == 0:
return self.ENV_MASK
raise ValueError(f"Unsupported env_ids type: {type(env_ids)}")
# ids provided as python sequence (already normalized to list above)
if len(env_ids) == 0:
return self.ENV_MASK
ids_wp = wp.array(env_ids, dtype=wp.int32, device=self.device)
wp.launch(
kernel=_generate_env_mask_from_ids_int32,
dim=ids_wp.shape[0],
inputs=[self.ENV_MASK, ids_wp],
device=self.device,
)
return self.ENV_MASK

Note: this allocates a temporary wp.array which is not CUDA-graph-capture friendly, but it matches the behavior of ArticulationData._resolve_1d_mask and is required for correctness on non-captured paths.

Comment on lines 164 to 197
sim: SimulationCfg = SimulationCfg(
newton_cfg=NewtonCfg(
solver_cfg=MJWarpSolverCfg(
nconmax=5,
),
)
)

# Scene settings
scene: CartpoleSceneCfg = CartpoleSceneCfg(num_envs=4096, env_spacing=4.0, clone_in_fabric=True)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
events: EventCfg = EventCfg()
# MDP settings
rewards: RewardsCfg = RewardsCfg()
terminations: TerminationsCfg = TerminationsCfg()
# Simulation settings
sim: SimulationCfg = SimulationCfg(
newton_cfg=NewtonCfg(
solver_cfg=MJWarpSolverCfg(
njmax=5,
nconmax=3,
ls_iterations=10,
cone="pyramidal",
impratio=1,
ls_parallel=True,
integrator="implicit",
),
num_substeps=1,
debug_mode=False,
use_cuda_graph=True,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate sim field — first definition is silently overridden

CartpoleEnvCfg defines sim: SimulationCfg twice (lines 164-170 and lines 182-197). Because Python @configclass/@dataclass semantics allow field redefinition, the first assignment is silently overridden by the second. This makes the code confusing and the first definition on line 164 is dead code. Consider removing the first sim definition to avoid confusion.

@@ -149,6 +156,7 @@ def setup_manager_visualizers(self):
Operations - MDP
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always-on timer decorator on step() in the stable env

The @Timer(name="env_step", ..., enable=True) decorator is hardcoded to enable=True, meaning every call to step() in the stable ManagerBasedRLEnv now incurs timer overhead unconditionally. The TIMER_ENABLED_STEP flag only controls the inner section timers. This should likely use enable=TIMER_ENABLED_STEP or be controlled by the same flag to avoid always-on overhead in production use.

Suggested change
"""
@Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us")

self.reset_terminated = self.termination_manager.terminated
self.reset_time_outs = self.termination_manager.time_outs

@Timer(name="env_step", msg="Step took:", enable=True, format="us")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always-on timer decorator on step() in experimental env

Same as the stable env: the @Timer(name="env_step", ..., enable=True) decorator is hardcoded to enable=True. This means every call to step() incurs timer overhead unconditionally, even though the TIMER_ENABLED_STEP flag controls the inner section timers. Consider using enable=TIMER_ENABLED_STEP for consistency.

Suggested change
@Timer(name="env_step", msg="Step took:", enable=True, format="us")
@Timer(name="env_step", msg="Step took:", enable=TIMER_ENABLED_STEP, format="us")

Comment on lines +117 to +120
def _manager_name_from_stage(self, stage: str) -> str:
if "_" not in stage:
raise ValueError(f"Invalid stage '{stage}'. Expected '{{manager_name}}_{{function_name}}'.")
return stage.split("_", 1)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stage name parsing assumes single underscore delimiter

_manager_name_from_stage splits on the first _ to derive the manager name (e.g., "ActionManager_process_action""ActionManager"). However, Scene_write_data_to_sim and Scene_reset use "Scene" as the prefix, which is not in MANAGER_NAMES. This means get_mode_for_manager("Scene") will always fall through to the default mode. While this works by accident (the default is WARP_CAPTURED), it means the Scene stage can never be independently configured to a different mode.

This isn't a bug per se, but worth being aware of since it creates an implicit coupling between the Scene call mode and the default configuration.

Comment on lines +72 to +78

@wp.kernel
def _reward_pre_compute_reset(
# output
reward_buf: wp.array(dtype=wp.float32),
step_reward: wp.array(dtype=wp.float32, ndim=2),
term_outs: wp.array(dtype=wp.float32, ndim=2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_reward_finalize stores weighted reward rate but uses it as step reward

The comment on line 68 says "store weighted reward rate (matches old: value/dt)", but the code stores raw * weight without dividing by dt. Line 73 then multiplies by dt to compute the actual step reward contribution (val = weighted * dt). The step_reward array appears to store weighted-but-not-dt-scaled values, which is indeed a "rate" interpretation. However, this could be confusing for downstream consumers of _step_reward_wp (used in get_active_iterable_terms) — the logged values represent reward rates, not actual per-step rewards. Worth verifying this matches the expected semantics.

Comment on lines 74 to 76
s = rng_state[env_id]
time_left[env_id] = wp.randf(s, lower, upper)
rng_state[env_id] = s + wp.uint32(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RNG state increment uses addition instead of wp.randf

In _resample_time_left_and_increment_counter, the RNG state is advanced by simple addition (rng_state[env_id] = s + wp.uint32(1)), unlike other kernels in this PR (e.g., in event_manager.py and noise_model.py) that advance the state implicitly through wp.randf / wp.randn calls. After wp.randf on line 75, the state s has already been advanced internally by Warp. Writing s + 1 back effectively skips one RNG output. While this won't cause a crash, it may produce slightly different random streams compared to the pattern used elsewhere in the codebase.

Suggested change
s = rng_state[env_id]
time_left[env_id] = wp.randf(s, lower, upper)
rng_state[env_id] = s + wp.uint32(1)
rng_state[env_id] = s

Comment on lines 34 to 51
def resolve(self, scene: InteractiveScene):
# run the stable resolution first (fills joint_ids/body_ids from names/regex)
super().resolve(scene)

# Build a Warp joint mask for articulations.
entity: Articulation = scene[self.name]

# Pre-allocate a full-length mask (all True for default selection).
if self.joint_ids == slice(None):
joint_ids_list = list(range(entity.num_joints))
mask_list = [True] * entity.num_joints
else:
joint_ids_list = list(self.joint_ids)
mask_list = [False] * entity.num_joints
for idx in joint_ids_list:
mask_list[idx] = True
self.joint_mask = wp.array(mask_list, dtype=wp.bool, device=scene.device)
self.joint_ids_wp = wp.array(joint_ids_list, dtype=wp.int32, device=scene.device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve() assumes the entity is always an Articulation

SceneEntityCfg.resolve() unconditionally casts scene[self.name] to Articulation (line 39) and accesses entity.num_joints. If the entity is a different type (e.g., a RigidObject), this will fail with an AttributeError. The base class SceneEntityCfg supports non-articulation entities, so this override should either check the entity type or guard against it.

Since this is currently only used for the Cartpole task (which is always an Articulation), this isn't immediately broken, but it will be a problem when the experimental workflow is extended to other asset types.

Comment on lines 136 to 190
mask: wp.array | torch.Tensor | None,
all_mask: wp.array,
scratch_mask: wp.array,
) -> wp.array:
"""Resolve ids/mask into a warp boolean mask.

Notes:
- Returns ``all_mask`` when both ids and mask are None (or ids is slice(None)).
- If ids are provided and mask is None, this populates ``scratch_mask`` in-place using Warp kernels.
- Torch inputs are supported for compatibility, but are generally not CUDA-graph-capture friendly.
"""
# Fast path: explicit mask provided.
if mask is not None:
if isinstance(mask, torch.Tensor):
# Ensure boolean + correct device, then wrap for Warp.
if mask.dtype != torch.bool:
mask = mask.to(dtype=torch.bool)
if str(mask.device) != self.device:
mask = mask.to(self.device)
return wp.from_torch(mask, dtype=wp.bool)
return mask

# Fast path: ids == all / not specified.
if ids is None:
return all_mask
if isinstance(ids, slice) and ids == slice(None):
return all_mask

# Normalize ids into a 1D wp.int32 array.
if isinstance(ids, slice):
# Convert to explicit indices (supports partial slices).
# We infer the valid range from the scratch mask length.
start, stop, step = ids.indices(scratch_mask.shape[0])
ids = list(range(start, stop, step))

if isinstance(ids, wp.array):
ids_wp = ids
elif isinstance(ids, torch.Tensor):
if ids.dtype != torch.int32:
ids = ids.to(dtype=torch.int32)
if str(ids.device) != self.device:
ids = ids.to(self.device)
ids_wp = wp.from_torch(ids, dtype=wp.int32)
else:
ids_list = list(ids)
ids_wp = wp.array(ids_list, dtype=wp.int32, device=self.device) if len(ids_list) > 0 else None

# Populate scratch mask.
scratch_mask.fill_(False)
if ids_wp is not None:
wp.launch(
kernel=generate_mask_from_ids,
dim=ids_wp.shape[0],
inputs=[scratch_mask, ids_wp],
device=self.device,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch mask is shared — concurrent callers will race

_resolve_1d_mask populates a shared scratch buffer (scratch_mask) in-place. If multiple callers resolve masks concurrently (e.g., resolve_joint_mask for two different action terms in the same step, since both use JOINT_MASK as the scratch), the second call will overwrite the first caller's mask before it's consumed.

This is acknowledged in manager_based_env_warp.py with the comment "Not thread-safe / re-entrant", but in the articulation data there's no such warning. The Newton articulation now uses resolve_joint_mask in many places (all the set_joint_* methods). If two action terms call set_joint_effort_target with different joint subsets in sequence, the first term's resolved joint mask could be overwritten by the second term before the first kernel launch consumes it. This is safe only if Warp kernel launches are queued (CUDA stream ordering), but worth documenting or verifying.

@hujc7 hujc7 force-pushed the dev-newton-warp-mig-manager-based branch from 7a309b8 to 94ce315 Compare February 18, 2026 06:19
Introduces ManagerBasedEnvWarp and ManagerBasedRLEnvWarp with
ManagerCallSwitch for per-manager stable/warp/captured mode selection.
Includes Newton articulation data extensions, RL wrapper adaptations,
and train script integration.
Warp-first implementations of all 7 managers (action, observation,
reward, termination, event, command, recorder) with mask-based reset
for CUDA graph compatibility. Includes MDP term library (observations,
rewards, terminations, events, actions), IO descriptors, and utility
modules (noise, modifiers, circular buffers, warp kernels).
Cartpole env config for the warp manager-based RL env, registered as
Isaac-Cartpole-Warp-v0. Includes warp-first custom reward term
(joint_pos_target_l2).
@hujc7 hujc7 force-pushed the dev-newton-warp-mig-manager-based branch from 94ce315 to a247e5d Compare February 20, 2026 10:12
@hujc7 hujc7 requested a review from hhansen-bdai as a code owner February 20, 2026 10:12
@hujc7
Copy link
Author

hujc7 commented Feb 20, 2026

@greptileai review updated.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

52 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant