Grasp planning and estimation have been a long-standing research problem in robotics, with two main approaches to find graspable poses on the objects: 1) geometric approach, which relies on 3D models of objects and the gripper to estimate valid grasp poses, and 2) data-driven, learning-based approach, with models trained to identify grasp poses from raw sensor observations. The latter assumes comprehensive geometric coverage during the training phase. However, the data-driven approach is typically biased toward tabletop scenarios and struggle to generalize to out-of-distribution scenarios with larger objects (e.g. chair). Additionally, raw sensor data (e.g. RGB-D data) from a single view of these larger objects is often incomplete and necessitates additional observations. In this paper, we take a geometric approach, leveraging advancements in object modeling (e.g. NeRF) to build an implicit model by taking RGB images from views around the target object. This model enables the extraction of explicit mesh model while also capturing the visual appearance from novel viewpoints that is useful for perception tasks like object detection and pose estimation. We further decompose the NeRF-reconstructed 3D mesh into superquadrics (SQs) - parametric geometric primitives, each mapped to a set of precomputed grasp poses, allowing grasp composition on the target object based on these primitives.Our proposed pipeline overcomes the problems: a) noisy depth and incomplete view of the object, with a modeling step, and b) generalization to objects of any size.
We validate the performance of our pipeline on 5 different large objects at different poses in real-world experiments using SPOT from Boston DynamicsThe primary contribution of the project is to propose a grasp pose estimation method on the large objects that are uncommon in table scenarios. The method involves decomposing the target object mesh into several primitive shapes, predicting grasp poses for each individual primitive, and subsequently filtering out the invalid poses to maintain only the valid ones. In this context, Superquadrics are utilized as the primitive shapes, and Marching Primitives is employed to decompose the target object's mesh into smaller superquadrics, upon which grasp pose estimation is applied.
We validate the performance of our pipeline on the robotic platform SPOT from Boston Dynamics. We use instant-NGP to construct the target object mesh. Also, unlike synthetic data in simulation, the object pose with respect to the gripper in real-world experiments is unknown in advance. Therefore, to deal with this issue, we depend on GroundingSAM and LoFTR to estimate the object pose relative to the gripper.