Category-level Last-meter Navigation

Last-Meter Navigation

Object-centric imitation learning framework that solves the last-meter navigation problem and produces manipulation-ready base poses with centimeter-level accuracy.

One-Instance Transfer

Demonstration of strong instance-to-category generalization, where a model trained on a single object instance reliably transfers to unseen objects of the same category.

RGB-Only Perception

Real-world validation that precise last-meter navigation is achievable using only onboard RGB observations, without depth, LiDAR, or map priors.

Video Presentation

Abstract

Achieving precise positioning of the mobile manipulator's base is essential for successful manipulation actions that follow. Most of the RGB-based navigation systems only guarantee coarse, meter-level accuracy, making them less suitable for the precise positioning phase of mobile manipulation.

This gap prevents manipulation policies from operating within the distribution of their training demonstrations, resulting in frequent execution failures. We address this gap by introducing an object-centric imitation learning framework for last-meter navigation, enabling a quadruped mobile manipulator robot to achieve manipulation-ready positioning using only RGB observations from its onboard cameras.

Our method conditions the navigation policy on three inputs: goal images, multi-view RGB observations from the onboard cameras, and a text prompt specifying the target object. A language-driven segmentation module and a spatial score-matrix decoder then supply explicit object grounding and relative pose reasoning. Using real-world data from a single object instance within a category, the system generalizes to unseen object instances across diverse environments with challenging lighting and background conditions. To comprehensively evaluate this, we introduce two metrics: an edge-alignment metric, which uses ground truth orientation, and an object-alignment metric, which evaluates how well the robot visually faces the target. Under these metrics, our policy achieves 73.47% success in edge-alignment and 96.94% success in object-alignment when positioning relative to unseen target objects. These results show that precise last-meter navigation can be achieved at a category-level without depth, LiDAR, or map priors, enabling a scalable pathway toward unified mobile manipulation.

Last-meter Navigation

Last-meter navigation is the stage between global path planning and manipulation in which the robot must achieve centimeter-level positional and degree-level orientation accuracy relative to a target. Whereas global navigation often deems success as stopping within about one meter, manipulation policies operate reliably only under much tighter alignment, and this mismatch causes many mobile manipulation failures. Last-meter navigation addresses this gap by explicitly focusing on the final meter of motion so that the robot arrives in a manipulation-ready pose.

In the example on the left: (1) global navigation first drives the robot near the target; (2) once the target object (e.g., the orange chair) is detected, our policy is invoked; and (3) last-meter navigation adjusts the robot’s base to a precise manipulation-ready pose defined by a goal observation.

Sequential Last-meter Navigation for Mobile Manipulation

Last-meter navigation enables the robot to navigate effectively between different objects, facilitating sequential multi-stage mobile manipulation tasks. By chaining last-meter navigation policies, the robot can transition from one workspace to another with high precision.

Methodology

Architecture Overview. At each timestep, the model receives current and goal observations. A segmentation module (driven by a language prompt) generates object masks. The action decoder uses a spatial score-matrix to predict discrete actions (Forward, Lateral, Rotate).

Qualitative Results

We present qualitative results of category-level last-meter navigation across a diverse set of target objects and environments. Each scenario corresponds to a distinct object instance and scene configuration. Multiple trials are shown for each scenario to highlight consistency, robustness, and failure modes under varying initial conditions.

Use the menu below to select a scenario and a corresponding trial:

Quantitative Results

We evaluate several policies to understand what enables reliable category-level last-meter navigation. Our full system, DinoScoreAux, achieves the highest success rates across both seen and unseen objects, highlighting the importance of explicit object grounding, the spatial score-matrix representation, and an auxiliary stopping mechanism. We further demonstrate strong generalization in three unseen indoor and outdoor environments.

BibTeX

@article{lee2025learning,
  title={Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance},
  author={Lee, Tzu-Hsien and Mahmudova, Fidan and Desingh, Karthik},
  journal={arXiv preprint arXiv:2512.11173},
  year={2025}
}

Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance

Last-meter navigation enables robots to achieve manipulation-ready positioning, bridging the critical gap between global navigation and manipulation.