Episode Manifest Design¶

Purpose¶

episode_manifest.json is the raw episode's resolved snapshot.

It exists so one raw bag can still be understood later without depending on:

current UI state
current local sensors files
current calibration files
current code assumptions about what was probably recorded

Core Decision¶

Each raw episode carries one manifest that answers:

what this take was
which session setup produced it
which profile reference was active
which sensors were actually resolved
which topics were actually recorded
which repo commit created it

Reusable policy stays outside the manifest. Episode-time truth stays inside it.

Current Manifest Shape¶

Today the recorder writes these top-level sections:

episode
session when the take came from a resolved session-plan path
profile
capture
sensors
recorded_topics
provenance

Example shape:

{
  "episode": {
    "episode_id": "episode-20260406-190533",
    "task_name": "pick_place",
    "language_instruction": "pick up the object and place it in the target area",
    "active_arms": ["lightning"],
    "operator": "srinivas"
  },
  "session": {
    "session_id": "20260406-185500",
    "active_arms": ["lightning"],
    "sensors_file": "data_pipeline/configs/sensors.local.yaml",
    "devices": [...],
    "selected_topics": [...]
  },
  "profile": {
    "name": "multisensor_20hz",
    "clock_policy": "host_capture_time_v1"
  },
  "capture": {
    "start_time_ns": 1712448333000000000,
    "end_time_ns": 1712448341000000000,
    "storage": {
      "bag_storage_id": "mcap"
    }
  },
  "sensors": {
    "sensors_file": "data_pipeline/configs/sensors.local.yaml",
    "calibration_results_file": "data_pipeline/configs/calibration.local.json",
    "devices": [...]
  },
  "recorded_topics": [
    {
      "topic": "/spark/session/teleop_active",
      "message_type": "std_msgs/msg/Bool"
    }
  ],
  "provenance": {
    "git_commit": "<repo commit sha>"
  }
}

Notes:

session is conditional
recorded_topics is intentionally a flat list snapshot
detailed per-sensor metadata lives under sensors.devices

What Each Section Means¶

`episode`¶

This is the human-facing description of the take:

episode id
task name
language instruction
active arms
operator

This is the minimum answer to:

what was this take supposed to be?

`session`¶

This is the resolved session snapshot when the take came from the operator console or another session-plan path.

It captures:

resolved active arms
chosen sensors file
resolved devices
resolved selected topics

This matters because the session plan is the concrete recording decision, not just a UI blob.

`profile`¶

This is the conversion-policy reference recorded at take time.

Today it stores only:

profile name
clock policy

The manifest does not inline the full reusable YAML.

`capture`¶

This is the raw bag write metadata:

start and end time
bag storage backend

This is where the manifest says how the raw bag was written.

`sensors`¶

This ties the take back to the local rig description at record time.

It records:

the selected sensors file path
the selected calibration results file path when present
resolved device entries under sensors.devices

Those device entries are where the detailed sensor metadata lives.

`recorded_topics`¶

This is the resolved topic inventory snapshot for the take.

It records only:

topic name
ROS message type

That is enough for readers that need to understand what the recorder actually captured, without stuffing static topic-contract prose into every episode.

`provenance`¶

This is where code-level provenance goes.

Current provenance includes:

provenance.git_commit

That is the repository commit recorded at episode creation time.

What Belongs In The Manifest¶

The manifest should carry episode-specific or record-time-resolved truth such as:

episode metadata
resolved session metadata when recorded through the operator console
the active profile reference
capture storage details
resolved sensor device entries
recorded topic inventory
record-time provenance

For sensors, that includes record-time metadata such as:

serial numbers or device paths
camera model and firmware when available
stream profiles and intrinsics when exposed by the bridge
calibration snapshot when solved calibration exists

What Does Not Belong In The Manifest¶

The manifest should not become a dumping ground for reusable config or dead version markers.

Keep these outside it:

the shared topic contract
the full reusable conversion YAML
the live sensors file as a mutable source of truth
the live calibration file as a mutable source of truth
archive-time transcode results
decorative schema or inventory version fields that no reader uses

The rule is simple:

store resolved episode truth
do not duplicate reusable policy
do not add fields just because they sound future-proof

Relationship To Other Files¶

Sensors file¶

The sensors file maps physical device identity to canonical sensor keys.

It answers:

which physical device is usually /spark/cameras/world/scene_1?

It does not answer the full per-episode question of what was actually recorded.

Calibration results¶

calibration.local.json holds the current solved camera geometry.

It remains the working local results file, but the recorder snapshots the relevant solved values into the manifest so old episodes stay self-describing.