SuperQ-GRASP:

Superquadrics-based Grasp Pose Estimation
on Larger Objects for Mobile-Manipulation

University of Minnesota, Twin Cities
ArXiv Video





SuperQ-GRASP is a grasp pose estimation method designed specifically to grasp the large objects
uncommon in a tabletop scenario based on Primitive Decomposition

Abstract

Grasp planning and estimation have been a long-standing research problem in robotics, with two main approaches to find graspable poses on the objects: 1) geometric approach, which relies on 3D models of objects and the gripper to estimate valid grasp poses, and 2) data-driven, learning-based approach, with models trained to identify grasp poses from raw sensor observations. The latter assumes comprehensive geometric coverage during the training phase. However, the data-driven approach is typically biased toward tabletop scenarios and struggle to generalize to out-of-distribution scenarios with larger objects (e.g. chair). Additionally, raw sensor data (e.g. RGB-D data) from a single view of these larger objects is often incomplete and necessitates additional observations. In this paper, we take a geometric approach, leveraging advancements in object modeling (e.g. NeRF) to build an implicit model by taking RGB images from views around the target object. This model enables the extraction of explicit mesh model while also capturing the visual appearance from novel viewpoints that is useful for perception tasks like object detection and pose estimation. We further decompose the NeRF-reconstructed 3D mesh into superquadrics (SQs) - parametric geometric primitives, each mapped to a set of precomputed grasp poses, allowing grasp composition on the target object based on these primitives.Our proposed pipeline overcomes the problems: a) noisy depth and incomplete view of the object, with a modeling step, and b) generalization to objects of any size.

We validate the performance of our pipeline on 5 different large objects at different poses in real-world experiments using SPOT from Boston Dynamics

Video

SuperQ-Grasp

Grasp Pose Estimation

The primary contribution of the project is to propose a grasp pose estimation method on the large objects that are uncommon in table scenarios. The method involves decomposing the target object mesh into several primitive shapes, predicting grasp poses for each individual primitive, and subsequently filtering out the invalid poses to maintain only the valid ones. In this context, Superquadrics are utilized as the primitive shapes, and Marching Primitives is employed to decompose the target object's mesh into smaller superquadrics, upon which grasp pose estimation is applied.



Overview of the Graps Pose Estimation module

Fig.2 - A comprehensive pipeline specifically designed to estimate grasp poses for larger objects. By representing an object as a collection of superquadrics, the proposed grasp pose estimation method (SuperQ-GRASP) estimates the grasp pose closest to the current gripper by selecting the nearest superquadric and its corresponding valid grasp candidates. Combined with the object detection and pose estimation module, our pipeline enables the mobile manipulator to perform grasping tasks effectively.



Real-world Experiments

We validate the performance of our pipeline on the robotic platform SPOT from Boston Dynamics. We use instant-NGP to construct the target object mesh. Also, unlike synthetic data in simulation, the object pose with respect to the gripper in real-world experiments is unknown in advance. Therefore, to deal with this issue, we depend on GroundingSAM and LoFTR to estimate the object pose relative to the gripper.



Results

Experiments on synthetic data

We create a dataset of 20 objects with 15 synthetic objects (3 chairs, 3 carts, 2 buckets, 2 boxes, 2 suitcases, 2 tables, and 1 folding chair) selected from PartNet-Mobility , and 5 real-world objects (2 chairs, 1 vacuum cleaner, 1 suitcase, and 1 table). These objects represent common large objects encountered daily and cover a diverse range of geometrical structures.

We establish two baselines to capture variations in how Contact-GraspNet can be employed for grasp pose estimation, allowing for comparison with our SuperQ-GRASP method: 1) CG+Mesh: This baseline applies Contact-GraspNet to the point cloud extracted from the complete 3D mesh of the target object; 2) CG+Depth: This baseline applied Contact-GraspNet to the point cloud obtained from a single-view depth image as seen by a robot's gripper camera.

Compared to the two baseline methods, our pipelilne can predict more stable grasp poses at the region closer to the camera, which is also the starting pose of the gripper in our case for the SPOT robot. Also, the predicted grasp poses are more concentrated in a specific region.

Qualitative Results on selected Objects

(Click the videos to open them in a new tab, if you want to see them more clearly)

Evaluation on the object


Image 1
Contact GraspNet+Depth
Image 2
Contact GraspNet+Mesh
Image 3
Superq Grasp
NOTE:
Red: invalid grasp poses; Green: valid grasp poses
Blue dots: observed depth point cloud as a partial view of the object



Real-world Experiments

To validate the performance of our pipeline in real-world scenarios, we place each of the 5 real-world objects at a specified location with arbitrary orientations. The Boston Dynamics Spot robot is then tasked with estimating the object's pose, identifying a graspable pose, and executing a reach-and-grasp action.

Our pipeline demonstrates a higher success rate across four test objects (two chairs, a vacuum cleaner, and a table), highlighting its capability to estimate valid grasp poses for larger objects with complex geometries, including high-genus objects like chairs. Here are the demonstrations.

Real-world Experiment examples

Visualize the real-world experiment for

Additional Results

Results on small objects

In addition, we show that our pipeline can have a competitive performance in comparison to the two baseline methods on the small synthetic objects that are typically placed on the taletop. The mesh models of the objects are taken from PartNet-Mobility

Qualitative Results on small Objects

(Click the videos to open them in a new tab, if you want to see them more clearly)

Evaluation on the object


Image 1
Contact GraspNet+Depth
Image 2
Contact GraspNet+Mesh
Image 3
Superq Grasp
NOTE:
Red: invalid grasp poses; Green: valid grasp poses
Blue dots: observed depth point cloud as a partial view of the object


Custom graspable region

We also demonstrate that our pipeline can allow the user to select the custom graspable region. For each individual superquadric at the edge of the object (labeled in different colors and associated with their own indices) , it can be regarded as one potential graspable region , where grasp poses can be generated and evaluated. The user can also use the index of the superquadric directly to select the desirable graspable region to generate valid grasp poses, depending on the downstream tasks.

Custom graspable region selection examples

By default, the pipeline will select the closest superquadric to the current gripper as the graspable region to generate grasp poses:
If specified, the user can also select the custom superquadric (in this example, the user wants to grasp one edge of the back of the chair, so the index of the superquadric, which is 54, can be fed to the pipeline) as the desired graspable region to generate grasp poses