Viewer Integration¶

Purpose¶

This page explains the current local viewer design, what it owns, and where the remaining design debt still lives.

Core Decision¶

The viewer is supported as a local review tool, not as a networked service surface.

Current supported contract:

the viewer server runs on the same machine as the operator console
the browser opens on that same machine
the base URL defaults to an account-local localhost port
PIPELINE_VIEWER_BASE_URL can override the host and port when needed

This local-only assumption removed a lot of confusion around hostname choice and stale environment-specific settings.

Current runtime assumptions are also account-local:

the viewer repo lives at the sibling path ../lerobot-dataset-visualizer
bun lives under ~/.bun/bin/bun
the viewer must already have a production build from data_pipeline/setup_viewer_env.sh

Why `Open Viewer` Owns Startup¶

The operator should not need to manually manage a separate viewer lifecycle for normal review.

That is why Open Viewer owns:

resolving the current published dataset target
ensuring the local dataset server is running
starting or restarting the viewer server if needed
opening the resolved episode URL

The setup script prepares the toolchain and production build. Runtime startup is still owned by the operator console.

In the current backend, Open Viewer also checks that the selected dataset's meta/info.json is actually reachable before treating the viewer as ready.

Why The Viewer Is Separate From Conversion¶

The viewer inspects published datasets. It does not define them.

That boundary matters because:

conversion should succeed without the viewer running
the viewer should not become a hidden dependency of raw recording
published datasets remain filesystem artifacts, not viewer-owned objects

Current Local Dataset Serving Model¶

The current local viewer integration uses two explicit local servers:

a generic viewer server from lerobot-dataset-visualizer
a read-only dataset server owned by spark-data-collection

The dataset server exposes published datasets directly from:

spark-data-collection/published/<dataset_id>

through the URL shape the viewer expects:

/datasets/local/<dataset_id>/resolve/main/...

And the backend starts the viewer with:

DATASET_URL=<dataset_base_url>/datasets

This keeps the dataset truth in one place and removes the hidden mirror state that previously lived inside the viewer repo.

By default, each Unix account gets its own local viewer port and its own local dataset-server port. That prevents the cross-account failure mode where two users on the same machine silently reuse the same localhost service.

Remaining Design Debt¶

The viewer path is cleaner now, but some real design debt remains:

the contract still spans two sibling repos
the frontend toolchain still introduces a separate per-account setup surface
the viewer is still adapted from a Hugging Face-oriented app rather than designed natively for this local pipeline

Design Rule¶

Any future viewer work should preserve these operator-facing truths:

Open Viewer is the one-click review entrypoint
the operator should not think about dataset-serving plumbing
published datasets remain the source artifact being reviewed

If the implementation changes later, those user-facing properties should stay.