UniMorphGrasp: Diffusion Model with Morphology-Awareness for Cross-Embodiment Dexterous Grasp Generation

1King's College London, 2Imperial College London, Corresponding Authors

Abstract

Cross-embodiment dexterous grasping aims to generate stable and diverse grasps for robotic hands with heterogeneous kinematic structures. Existing methods are often tailored to specific hand designs and fail to generalize to unseen hand morphologies outside the training distribution. To address these limitations, we propose UniMorphGrasp, a diffusion-based framework that incorporates hand morphological information into the grasp generation process for unified cross-embodiment grasp synthesis. The proposed approach maps grasps from diverse robotic hands into a unified human-like canonical hand pose representation, providing a common space for learning. Grasp generation is then conditioned on structured representations of hand kinematics, encoded as graphs derived from hand configurations, together with object geometry. In addition, a loss function is introduced that exploits the hierarchical organization of hand kinematics to guide joint-level supervision. Extensive experiments demonstrate that UniMorphGrasp achieves state-of-the-art performance on existing dexterous grasp benchmarks and exhibits strong zero-shot generalization to previously unseen hand structures, enabling scalable and practical cross-embodiment grasp deployment.

Pipeline Overview

UniMorphGrasp Pipeline
(Left) The overview of our proposed UniMorphGrasp for cross-embodiment dexterous grasp generation. Given an object point cloud and an arbitrary hand morphology extracted from its URDF specification (mapped to a pre-defined canonical hand format), we employ a morphology encoder to extract morphology representations from the hand's joint structure. The hand pose (noised via a diffusion scheduler in training) is embedded through a linear layer, and concatenated with its active joint mask embedding to obtain the hand representation. This representation is then processed through a morphology-aware denoising model, where the iterative process is conditioned on both the morphology representation and the point cloud representation extracted via a Point Transformer. The entire framework is trained using a morphology-aware loss function. (Right) The structure of our morphology-aware denoising model, which is conditioned on the encoded morphology and the point cloud representations via cross-attention.

Method Performance

MultiDex Performance
UniMorphGrasp can generate stable and diverse grasps for cross-embodiment dexterous hands.
More MultiDex Results
More results generated by our UniMorphGrasp on the MultiDex dataset.
Quantitative Results 1
Quantitative comparison of our UniMorphGrasp (w/. and w/o. the morphology-aware loss) with different cross-embodiment dexterous grasp synthesis baselines across three robotic hands from three to five fingers: Barrett, Allegro, and Shadow hand.
Qualitative Comparison
Qualitative comparison with baselines 1) GenDexGrasp and 2) DRO-Grasp, where our results demonstrate superior surface conformity and stable form-closure.

Zero-Shot Generalization to Novel Hand Morphologies

Generalization 1
Topological Variations: We selectively remove fingers of the Shadow hand.
Generalization 2
Geometrical Variations: We scale the finger lengths by factors of 1.5× (lengthened).
Generalization 3
Geometrical Variations: We scale the finger lengths by factors of 0.8× (shortened).
Generalization 4
Embodiment Variations: We replace Shadow Hand fingers with Allegro Hand fingers to introduce embodiment changes in joint axis, joint limits, and link geometries.

Cross-Dataset Results

We conduct cross-dataset evaluations on the Multi-GraspLLM and Objaverse datasets to evaluate the zero-shot generalization capability of our model.

Multi GraspLLM
Visualizations of cross-embodiment grasps synthesized by UniMorphGrasp on the Multi-GraspLLM dataset.
Objaverse Results
Visualizations of cross-embodiment grasps synthesized by UniMorphGrasp on the Objaverse dataset.
Quantitative Results 2
Cross-dataset zero-shot generalization results. We evaluate models trained on MultiDex directly on unseen datasets: Multi-GraspLLM and Objaverse.

Real-World Experiments

We validate UniMorphGrasp in real-world scenarios using a UR5e arm equipped with a Leap Hand.

Real World 1 Real World 2

Real-world grasping demonstrations on the Leap Hand.

Quantitative Results 3
Quantitative real-world evaluation on the Leap Hand. We report the success rate over 10 attempts for eight objects from the YCB dataset.