ManiSkill ↔ MetaSim / SAPIEN3 integration (native 1:1)#

ManiSkill3 is a SAPIEN-backed manipulation benchmark. RoboVerse ships MetaSim-native ManiSkill tasks that reproduce the native ManiSkill (physx_cpu) rollout 1:1 through the standard BaseTaskEnv + SAPIEN3 handler path — no runtime mani_skill import, so the clone is deletable.

This page is self-contained: the reproduction recipe, the per-task measured parity, and the run/verify commands are all below.

What “1:1” means here#

Native ManiSkill on sim_backend="physx_cpu" is bit-deterministic, so a clean SAPIEN3 scene can reproduce it exactly. The reproduction recipe is a set of opt-in SimParamCfg knobs on the SAPIEN3 handler (all default-off, so existing tasks are byte-identical):

Disable gravity on every robot link (sapien_disable_robot_gravity) — how ManiSkill holds the arm; the single biggest dynamics factor.
Full PhysX config set globally before scene creation (sapien_apply_global_physx): solver iters, PCM/TGS, contact/rest offset, sleep/bounce thresholds, default material — mirrors BaseEnv._set_scene_config.
Drive with force_limit + mode="force" (sapien_drive_force_mode).
Table as a kinematic box with the ground plane far below (sapien_ground_altitude); PrimitiveCubeCfg / PrimitiveMultiBoxCfg honor fix_base_link (kinematic).
Controller = ManiSkill pd_joint_delta_pos (_native/control.py, parametric over arm DOF + optional mimic gripper), decimation sim_freq // control_freq.
Gripper grasp material (recipe.apply_maniskill_gripper_friction) — ManiSkill sets the panda fingers to friction 2.0 + contact-patch radius 0.1 in code (Panda.urdf_config), not in the URDF, so a plain load leaves them at the scene-default 0.3 and the grasped object slips ~1 cm during a lift. Replicating it pulls the contact-phase object trajectory back to the PhysX CPU/CUDA noise floor (~2e-4) and is what makes demo action-replay reproduce success.
Per-object contact material (recipe.apply_object_friction, declared via a task’s object_frictions) — some tasks build an object with a custom material in code too: PushT’s Tee is friction 3.0 (push_t.py). Without it the long-horizon push slides the Tee off the demonstrated path. With it, PushT drops 6.5 → 0.87 / 255; the remainder was post-success drift, not a reproduction gap — the RL demo keeps nudging the already-placed Tee for ~75 steps after the task succeeds, and at the moment of success the Tee divergence is only 0.3 mm. The renderer now ends each clip at task completion (--truncate-buffer), so the shown window is the genuine 1:1 completion: PushT 0.067 / 255, in line with the grasping tasks. (Everything physical is matched — geometry, materials, solver iters 15/1, masses/inertia, global PhysX flags; the only residual is the impulsive first-contact solver order, which enable_enhanced_determinism=False — ManiSkill’s own default — lets depend on body-creation order that two independently-built scenes cannot share.)

Shipped tasks (15)#

import roboverse_pack.tasks.maniskill registers them as maniskill.<name>_native:

Task	robot	object pose Δ vs native	dense reward	success
`pick_cube`	panda	4.7e-6	bitwise (5.96e-8)	bitwise
`push_cube`	panda	2.3e-7	bitwise	bitwise
`pull_cube`	panda	2.3e-7	bitwise	bitwise
`stack_cube`	panda	1.5e-6	bitwise	bitwise
`poke_cube`	panda	2.9e-7	bitwise	bitwise
`lift_peg_upright`	panda	3.0e-7	bitwise	bitwise
`roll_ball`	panda	1.2e-7	bitwise	bitwise
`place_sphere`	panda	1.2e-7	bitwise	bitwise
`stack_pyramid`	panda	8.4e-7	(no native dense)	bitwise
`pull_cube_tool`	panda	2.8e-7	bitwise	bitwise
`peg_insertion_side`	panda	4.8e-5	bitwise	bitwise
`plug_charger`	panda	5.4e-7	(no native dense)	bitwise
`push_t`	panda_stick	Tee Δ=0	—	proxy
`draw_triangle`	panda_stick	9.9e-6	—	proxy
`two_robot_pick_cube`	2× panda_wristcam	cube 3.3e-7	—	proxy

Dynamics track native to PhysX float32 roundoff (object pose 1.2e-7–4.8e-5 over aggressive random steps; ~1e-6 under demo-like motion).
Action-level 1:1 for every ManiSkill robot layout: panda (7 arm + 1 mimic gripper, 8-dim), panda_stick (7-dim arm-only, PushT/DrawTriangle), and multi-agent (TwoRobotPickCube, 2 × 8-dim split per robot by ManiSkillMultiRobotTask).
Dense rewards: all 10 tasks with a native dense reward match compute_dense_reward to float32 epsilon (5.96e-8–1.19e-7).
Success: all 12 tabletop predicates ported bitwise (including peg-in-hole and charger _compute_distance).
is_grasped matches Panda.is_grasping (18/18, contact forces ~0.01 N) via the new sapien3 get_pairwise_contact_force.
Reset distribution matches ManiSkill’s per-episode spawn/goal sampling (a persistent RNG advances across resets; an explicit seed is reproducible).

Side-by-side videos — demo action-replay, 1:1 task completion#

Each clip replays an official ManiSkill demonstration (the recorded pd_joint_delta_pos action sequence) from the demo’s initial state through both native ManiSkill and the shipped maniskill.<name>_native task, and only an episode where both sides reach the task’s success predicate is shown — so every clip is a genuine end-to-end task completion, not a synthetic action sweep. Three panels: left = native ManiSkill (physx_cpu), middle = the shipped task driven through the SAPIEN3 handler with the identical actions, right = the amplified pixel difference. Both panels are rendered in ManiSkill’s own scene (identical assets/lighting/camera), so the only variable on screen is the physics state — the diff panel stays near-black, which is the picture of 1:1. Among the episodes where both sides succeed, the renderer picks the tightest-tracking one (--rank-by agreement) and ends the clip at task completion (--truncate-buffer, a few frames after the success predicate fires) — so the shown window is the genuine 1:1 completion, not the dozens of post-success steps where an RL demo keeps fidgeting the already-placed object. Every both-success episode completes the task by construction (success requires the manipuland to reach its goal). Each caption gives the demo episode, the step at which the task completes, and the mean pixel diff.

Regenerate any clip with python -m tools.maniskill_integration.render_demo_replay --task PickCube-v1 --shipped pick_cube --demo <path-to-pd_joint_delta_pos-demo.h5> --goal-actor goal_site.

pick_cube (PickCube-v1, traj_28) — completes at step 15, clip ends at task success; diff×8, 0.012/255

push_cube (PushCube-v1, traj_14) — completes at step 15; diff×8, 0.033/255

pull_cube (PullCube-v1, traj_2) — completes at step 14; diff×8, 0.020/255

stack_cube (StackCube-v1, traj_8) — completes at step 21; diff×8, 0.114/255

poke_cube (PokeCube-v1, traj_31) — completes at step 14; diff×8, 0.084/255

lift_peg_upright (LiftPegUpright-v1, traj_20) — completes at step 11; diff×8, 0.073/255

roll_ball (RollBall-v1, traj_11) — completes at step 51; diff×8, 0.082/255

pull_cube_tool (PullCubeTool-v1, traj_0) — completes at step 209; diff×8, 0.085/255

stack_pyramid (StackPyramid-v1, traj_20) — completes at step 189; diff×8, 0.062/255

push_t (PushT-v1, traj_26, panda_stick) — completes at step 21; diff×8, 0.067/255 (Tee friction-3.0 + clip ends at completion, before post-success drift)

two_robot_pick_cube (TwoRobotPickCube-v1, traj_16, dual-arm 16-dim handover) — completes at step 34; diff×8, 0.024/255

Demo replay#

Official ManiSkill pd_joint_delta_pos demos replay through the shipped tasks (tools/maniskill_integration/replay_demo.py): seeding each episode’s initial state + goal and replaying the recorded actions reproduces the demonstrated success — 22/25 PickCube demos (native physx_cpu itself reproduces 24/25 open-loop on these physx_cuda-recorded demos; shipped agrees with native on 23/25 — the gap is the ~2e-4 contact residual near the 0.025 m goal boundary). Pure action replay is deterministic: an identical grasp+lift sequence keeps the cube within ~2e-4 m of native through the contact-rich phase, once the gripper grasp material (friction 2.0) is applied — without it the cube slips ~1 cm and success flips (8/12 → 22/25).

Run#

conda activate maniskill1to1   # roboverse env + mani_skill + sapien3

# instantiate a shipped native task (standard registry path)
python -c "
import roboverse_pack.tasks.maniskill, copy, torch
from metasim.task.registry import get_task_class
cls = get_task_class('maniskill.pick_cube_native')
sc = copy.deepcopy(cls.scenario); sc.simulator='sapien3'; sc.num_envs=1; sc.headless=True; sc.cameras=[]
env = cls(sc); env.reset(seed=0)
for _ in range(50): obs, rew, term, trunc, info = env.step(torch.zeros((1,8)))
"

# measure native<->recipe parity (single-agent: 14 tasks; multi-agent: separate tool)
SAPIEN_HEADLESS=1 python -m tools.maniskill_integration.parity_native --all --steps 30
SAPIEN_HEADLESS=1 python -m tools.maniskill_integration.parity_multi_agent --task TwoRobotPickCube-v1

# side-by-side demo action-replay video (native | shipped task | diff), 1:1 task completion
SAPIEN_HEADLESS=1 python -m tools.maniskill_integration.render_demo_replay \
    --task PickCube-v1 --shipped pick_cube --goal-actor goal_site \
    --demo ~/.maniskill/demos/PickCube-v1/rl/trajectory.none.pd_joint_delta_pos.physx_cuda.h5

# replay official ManiSkill demos through the shipped task
SAPIEN_HEADLESS=1 python -m tools.maniskill_integration.replay_demo --task pick_cube --episodes 25

# regression tests
python -m pytest tests/test_maniskill_native_task.py tests/test_maniskill_reward_grasp.py \
    tests/test_maniskill_success.py tests/test_maniskill_action_levels.py \
    tests/test_maniskill_reset.py tests/test_maniskill_demo_replay.py \
    tests/test_maniskill_gripper_friction.py

Assets#

The ManiSkill panda assets (panda_v2.urdf for the gripper arm, panda_stick.urdf, panda_v3.urdf for the wrist-cam arm, + franka_description / realsense meshes) are vendored under roboverse_data/robots/maniskill_panda/ and published to the HuggingFace-backed RoboVerseOrg/roboverse_data dataset. The locators (_native/recipe.panda_urdf_path etc.) prefer the local copy, else token-free snapshot_download from HF, else fall back to an installed mani_skill package — so the clone loads its robots without a mani_skill install.

Backward compatibility#

All MetaSim-side changes are opt-in (SimParamCfg knobs default to None/False; the new PrimitiveMultiBoxCfg, get_pairwise_contact_force, and PrimitiveCubeCfg fix_base_link handling are additive) — existing SAPIEN3 tasks are byte-identical (417 sapien3 + general MetaSim tests pass). The RoboVerse side is purely additive (_native/ package + tasks + tools + tests). Both MetaSim and RoboVerse changes are merged to their respective public main branches.

Demo-replay completion coverage#

11 tasks have a verified side-by-side clip above where native ManiSkill and the shipped task both reach success under pure action-replay of an official demo: pick_cube, push_cube, pull_cube, stack_cube, poke_cube, lift_peg_upright, roll_ball, pull_cube_tool, stack_pyramid, push_t (panda_stick), and two_robot_pick_cube (dual-arm, 16-dim). This spans all three robot layouts.

Four tasks are not shown as demo-replay completions, honestly:

peg_insertion_side and plug_charger randomize their internal geometry per episode (the box-hole position / charger prong layout). The shipped task uses a fixed peg/hole, so a demo whose hole sits elsewhere cannot be inserted by pure open-loop replay — and high-precision insertion does not reproduce open-loop even on native physx_cpu without per-step env-state injection. Their success/reward formulas are still ported bitwise; only the per-episode geometry is unreproduced.
draw_triangle has only a proxy success (canvas-trace overlap) that does not fire under replay of a demo authored against a different target triangle.
place_sphere has no upstream ManiSkill demo dataset.