ManiSkill ↔ MetaSim / SAPIEN3 integration (native 1:1)#
ManiSkill3 is a SAPIEN-backed manipulation
benchmark. RoboVerse ships MetaSim-native ManiSkill tasks that reproduce the
native ManiSkill (physx_cpu) rollout 1:1 through the standard BaseTaskEnv +
SAPIEN3 handler path — no runtime mani_skill import, so the clone is deletable.
This page is self-contained: the reproduction recipe, the per-task measured parity, and the run/verify commands are all below.
What “1:1” means here#
Native ManiSkill on sim_backend="physx_cpu" is bit-deterministic, so a clean SAPIEN3
scene can reproduce it exactly. The reproduction recipe is a set of opt-in
SimParamCfg knobs on the SAPIEN3 handler (all default-off, so existing tasks are
byte-identical):
Disable gravity on every robot link (
sapien_disable_robot_gravity) — how ManiSkill holds the arm; the single biggest dynamics factor.Full PhysX config set globally before scene creation (
sapien_apply_global_physx): solver iters, PCM/TGS, contact/rest offset, sleep/bounce thresholds, default material — mirrorsBaseEnv._set_scene_config.Drive with
force_limit+mode="force"(sapien_drive_force_mode).Table as a kinematic box with the ground plane far below (
sapien_ground_altitude);PrimitiveCubeCfg/PrimitiveMultiBoxCfghonorfix_base_link(kinematic).Controller = ManiSkill
pd_joint_delta_pos(_native/control.py, parametric over arm DOF + optional mimic gripper), decimationsim_freq // control_freq.Gripper grasp material (
recipe.apply_maniskill_gripper_friction) — ManiSkill sets the panda fingers to friction 2.0 + contact-patch radius 0.1 in code (Panda.urdf_config), not in the URDF, so a plain load leaves them at the scene-default 0.3 and the grasped object slips ~1 cm during a lift. Replicating it pulls the contact-phase object trajectory back to the PhysX CPU/CUDA noise floor (~2e-4) and is what makes demo action-replay reproduce success.Per-object contact material (
recipe.apply_object_friction, declared via a task’sobject_frictions) — some tasks build an object with a custom material in code too: PushT’s Tee is friction 3.0 (push_t.py). Without it the long-horizon push slides the Tee off the demonstrated path. With it, PushT drops 6.5 → 0.87 / 255; the remainder was post-success drift, not a reproduction gap — the RL demo keeps nudging the already-placed Tee for ~75 steps after the task succeeds, and at the moment of success the Tee divergence is only 0.3 mm. The renderer now ends each clip at task completion (--truncate-buffer), so the shown window is the genuine 1:1 completion: PushT 0.067 / 255, in line with the grasping tasks. (Everything physical is matched — geometry, materials, solver iters 15/1, masses/inertia, global PhysX flags; the only residual is the impulsive first-contact solver order, whichenable_enhanced_determinism=False— ManiSkill’s own default — lets depend on body-creation order that two independently-built scenes cannot share.)
Shipped tasks (15)#
import roboverse_pack.tasks.maniskill registers them as maniskill.<name>_native:
Task |
robot |
object pose Δ vs native |
dense reward |
success |
|---|---|---|---|---|
|
panda |
4.7e-6 |
bitwise (5.96e-8) |
bitwise |
|
panda |
2.3e-7 |
bitwise |
bitwise |
|
panda |
2.3e-7 |
bitwise |
bitwise |
|
panda |
1.5e-6 |
bitwise |
bitwise |
|
panda |
2.9e-7 |
bitwise |
bitwise |
|
panda |
3.0e-7 |
bitwise |
bitwise |
|
panda |
1.2e-7 |
bitwise |
bitwise |
|
panda |
1.2e-7 |
bitwise |
bitwise |
|
panda |
8.4e-7 |
(no native dense) |
bitwise |
|
panda |
2.8e-7 |
bitwise |
bitwise |
|
panda |
4.8e-5 |
bitwise |
bitwise |
|
panda |
5.4e-7 |
(no native dense) |
bitwise |
|
panda_stick |
Tee Δ=0 |
— |
proxy |
|
panda_stick |
9.9e-6 |
— |
proxy |
|
2× panda_wristcam |
cube 3.3e-7 |
— |
proxy |
Dynamics track native to PhysX float32 roundoff (object pose 1.2e-7–4.8e-5 over aggressive random steps; ~1e-6 under demo-like motion).
Action-level 1:1 for every ManiSkill robot layout: panda (7 arm + 1 mimic gripper, 8-dim), panda_stick (7-dim arm-only, PushT/DrawTriangle), and multi-agent (TwoRobotPickCube, 2 × 8-dim split per robot by
ManiSkillMultiRobotTask).Dense rewards: all 10 tasks with a native dense reward match
compute_dense_rewardto float32 epsilon (5.96e-8–1.19e-7).Success: all 12 tabletop predicates ported bitwise (including peg-in-hole and charger
_compute_distance).is_graspedmatchesPanda.is_grasping(18/18, contact forces ~0.01 N) via the new sapien3get_pairwise_contact_force.Reset distribution matches ManiSkill’s per-episode spawn/goal sampling (a persistent RNG advances across resets; an explicit seed is reproducible).
Side-by-side videos — demo action-replay, 1:1 task completion#
Each clip replays an official ManiSkill demonstration (the recorded pd_joint_delta_pos action
sequence) from the demo’s initial state through both native ManiSkill and the shipped
maniskill.<name>_native task, and only an episode where both sides reach the task’s success
predicate is shown — so every clip is a genuine end-to-end task completion, not a synthetic action
sweep. Three panels: left = native ManiSkill (physx_cpu), middle = the shipped task driven
through the SAPIEN3 handler with the identical actions, right = the amplified pixel difference.
Both panels are rendered in ManiSkill’s own scene (identical assets/lighting/camera), so the only
variable on screen is the physics state — the diff panel stays near-black, which is the picture of
1:1. Among the episodes where both sides succeed, the renderer picks the tightest-tracking one
(--rank-by agreement) and ends the clip at task completion (--truncate-buffer, a few frames
after the success predicate fires) — so the shown window is the genuine 1:1 completion, not the dozens
of post-success steps where an RL demo keeps fidgeting the already-placed object. Every both-success
episode completes the task by construction (success requires the manipuland to reach its goal). Each
caption gives the demo episode, the step at which the task completes, and the mean pixel diff.
Regenerate any clip with
python -m tools.maniskill_integration.render_demo_replay --task PickCube-v1 --shipped pick_cube --demo <path-to-pd_joint_delta_pos-demo.h5> --goal-actor goal_site.
pick_cube (PickCube-v1, traj_28) — completes at step 15, clip ends at task success; diff×8, 0.012/255
push_cube (PushCube-v1, traj_14) — completes at step 15; diff×8, 0.033/255
pull_cube (PullCube-v1, traj_2) — completes at step 14; diff×8, 0.020/255
stack_cube (StackCube-v1, traj_8) — completes at step 21; diff×8, 0.114/255
poke_cube (PokeCube-v1, traj_31) — completes at step 14; diff×8, 0.084/255
lift_peg_upright (LiftPegUpright-v1, traj_20) — completes at step 11; diff×8, 0.073/255
roll_ball (RollBall-v1, traj_11) — completes at step 51; diff×8, 0.082/255
pull_cube_tool (PullCubeTool-v1, traj_0) — completes at step 209; diff×8, 0.085/255
stack_pyramid (StackPyramid-v1, traj_20) — completes at step 189; diff×8, 0.062/255
push_t (PushT-v1, traj_26, panda_stick) — completes at step 21; diff×8, 0.067/255 (Tee friction-3.0 + clip ends at completion, before post-success drift)
two_robot_pick_cube (TwoRobotPickCube-v1, traj_16, dual-arm 16-dim handover) — completes at step 34; diff×8, 0.024/255
Demo replay#
Official ManiSkill pd_joint_delta_pos demos replay through the shipped tasks
(tools/maniskill_integration/replay_demo.py): seeding each episode’s initial state + goal and
replaying the recorded actions reproduces the demonstrated success — 22/25 PickCube demos (native
physx_cpu itself reproduces 24/25 open-loop on these physx_cuda-recorded demos; shipped agrees
with native on 23/25 — the gap is the ~2e-4 contact residual near the 0.025 m goal boundary). Pure
action replay is deterministic: an identical grasp+lift sequence keeps the cube within ~2e-4 m of
native through the contact-rich phase, once the gripper grasp material (friction 2.0) is applied —
without it the cube slips ~1 cm and success flips (8/12 → 22/25).
Run#
conda activate maniskill1to1 # roboverse env + mani_skill + sapien3
# instantiate a shipped native task (standard registry path)
python -c "
import roboverse_pack.tasks.maniskill, copy, torch
from metasim.task.registry import get_task_class
cls = get_task_class('maniskill.pick_cube_native')
sc = copy.deepcopy(cls.scenario); sc.simulator='sapien3'; sc.num_envs=1; sc.headless=True; sc.cameras=[]
env = cls(sc); env.reset(seed=0)
for _ in range(50): obs, rew, term, trunc, info = env.step(torch.zeros((1,8)))
"
# measure native<->recipe parity (single-agent: 14 tasks; multi-agent: separate tool)
SAPIEN_HEADLESS=1 python -m tools.maniskill_integration.parity_native --all --steps 30
SAPIEN_HEADLESS=1 python -m tools.maniskill_integration.parity_multi_agent --task TwoRobotPickCube-v1
# side-by-side demo action-replay video (native | shipped task | diff), 1:1 task completion
SAPIEN_HEADLESS=1 python -m tools.maniskill_integration.render_demo_replay \
--task PickCube-v1 --shipped pick_cube --goal-actor goal_site \
--demo ~/.maniskill/demos/PickCube-v1/rl/trajectory.none.pd_joint_delta_pos.physx_cuda.h5
# replay official ManiSkill demos through the shipped task
SAPIEN_HEADLESS=1 python -m tools.maniskill_integration.replay_demo --task pick_cube --episodes 25
# regression tests
python -m pytest tests/test_maniskill_native_task.py tests/test_maniskill_reward_grasp.py \
tests/test_maniskill_success.py tests/test_maniskill_action_levels.py \
tests/test_maniskill_reset.py tests/test_maniskill_demo_replay.py \
tests/test_maniskill_gripper_friction.py
Assets#
The ManiSkill panda assets (panda_v2.urdf for the gripper arm, panda_stick.urdf,
panda_v3.urdf for the wrist-cam arm, + franka_description / realsense meshes) are
vendored under roboverse_data/robots/maniskill_panda/ and published to the
HuggingFace-backed RoboVerseOrg/roboverse_data dataset. The locators
(_native/recipe.panda_urdf_path etc.) prefer the local copy, else token-free
snapshot_download from HF, else fall back to an installed mani_skill package — so the
clone loads its robots without a mani_skill install.
Backward compatibility#
All MetaSim-side changes are opt-in (SimParamCfg knobs default to None/False; the new
PrimitiveMultiBoxCfg, get_pairwise_contact_force, and PrimitiveCubeCfg
fix_base_link handling are additive) — existing SAPIEN3 tasks are byte-identical
(417 sapien3 + general MetaSim tests pass). The RoboVerse side is purely additive
(_native/ package + tasks + tools + tests). Both MetaSim and RoboVerse changes are merged
to their respective public main branches.
Demo-replay completion coverage#
11 tasks have a verified side-by-side clip above where native ManiSkill and the shipped task
both reach success under pure action-replay of an official demo: pick_cube, push_cube,
pull_cube, stack_cube, poke_cube, lift_peg_upright, roll_ball, pull_cube_tool,
stack_pyramid, push_t (panda_stick), and two_robot_pick_cube (dual-arm, 16-dim). This spans
all three robot layouts.
Four tasks are not shown as demo-replay completions, honestly:
peg_insertion_sideandplug_chargerrandomize their internal geometry per episode (the box-hole position / charger prong layout). The shipped task uses a fixed peg/hole, so a demo whose hole sits elsewhere cannot be inserted by pure open-loop replay — and high-precision insertion does not reproduce open-loop even on nativephysx_cpuwithout per-step env-state injection. Their success/reward formulas are still ported bitwise; only the per-episode geometry is unreproduced.draw_trianglehas only a proxy success (canvas-trace overlap) that does not fire under replay of a demo authored against a different target triangle.place_spherehas no upstream ManiSkill demo dataset.