Episode Manifest Design¶
Purpose¶
episode_manifest.json is the raw episode's resolved snapshot.
It exists so one raw bag can still be understood later without depending on:
- current UI state
- current local sensors files
- current calibration files
- current code assumptions about what was probably recorded
Core Decision¶
Each raw episode carries one manifest that answers:
- what this take was
- which session setup produced it
- which profile reference was active
- which sensors were actually resolved
- which topics were actually recorded
- which repo commit created it
Reusable policy stays outside the manifest. Episode-time truth stays inside it.
Current Manifest Shape¶
Today the recorder writes these top-level sections:
episodesessionwhen the take came from a resolved session-plan pathprofilecapturesensorsrecorded_topicsprovenance
Example shape:
{
"episode": {
"episode_id": "episode-20260406-190533",
"task_name": "pick_place",
"language_instruction": "pick up the object and place it in the target area",
"active_arms": ["lightning"],
"operator": "srinivas"
},
"session": {
"session_id": "20260406-185500",
"active_arms": ["lightning"],
"sensors_file": "data_pipeline/configs/sensors.local.yaml",
"devices": [...],
"selected_topics": [...]
},
"profile": {
"name": "multisensor_20hz",
"clock_policy": "host_capture_time_v1"
},
"capture": {
"start_time_ns": 1712448333000000000,
"end_time_ns": 1712448341000000000,
"storage": {
"bag_storage_id": "mcap"
}
},
"sensors": {
"sensors_file": "data_pipeline/configs/sensors.local.yaml",
"calibration_results_file": "data_pipeline/configs/calibration.local.json",
"devices": [...]
},
"recorded_topics": [
{
"topic": "/spark/session/teleop_active",
"message_type": "std_msgs/msg/Bool"
}
],
"provenance": {
"git_commit": "<repo commit sha>"
}
}
Notes:
sessionis conditionalrecorded_topicsis intentionally a flat list snapshot- detailed per-sensor metadata lives under
sensors.devices
What Each Section Means¶
episode¶
This is the human-facing description of the take:
- episode id
- task name
- language instruction
- active arms
- operator
This is the minimum answer to:
- what was this take supposed to be?
session¶
This is the resolved session snapshot when the take came from the operator console or another session-plan path.
It captures:
- resolved active arms
- chosen sensors file
- resolved devices
- resolved selected topics
This matters because the session plan is the concrete recording decision, not just a UI blob.
profile¶
This is the conversion-policy reference recorded at take time.
Today it stores only:
- profile name
- clock policy
The manifest does not inline the full reusable YAML.
capture¶
This is the raw bag write metadata:
- start and end time
- bag storage backend
This is where the manifest says how the raw bag was written.
sensors¶
This ties the take back to the local rig description at record time.
It records:
- the selected sensors file path
- the selected calibration results file path when present
- resolved device entries under
sensors.devices
Those device entries are where the detailed sensor metadata lives.
recorded_topics¶
This is the resolved topic inventory snapshot for the take.
It records only:
- topic name
- ROS message type
That is enough for readers that need to understand what the recorder actually captured, without stuffing static topic-contract prose into every episode.
provenance¶
This is where code-level provenance goes.
Current provenance includes:
provenance.git_commit
That is the repository commit recorded at episode creation time.
What Belongs In The Manifest¶
The manifest should carry episode-specific or record-time-resolved truth such as:
- episode metadata
- resolved session metadata when recorded through the operator console
- the active profile reference
- capture storage details
- resolved sensor device entries
- recorded topic inventory
- record-time provenance
For sensors, that includes record-time metadata such as:
- serial numbers or device paths
- camera model and firmware when available
- stream profiles and intrinsics when exposed by the bridge
- calibration snapshot when solved calibration exists
What Does Not Belong In The Manifest¶
The manifest should not become a dumping ground for reusable config or dead version markers.
Keep these outside it:
- the shared topic contract
- the full reusable conversion YAML
- the live sensors file as a mutable source of truth
- the live calibration file as a mutable source of truth
- archive-time transcode results
- decorative schema or inventory version fields that no reader uses
The rule is simple:
- store resolved episode truth
- do not duplicate reusable policy
- do not add fields just because they sound future-proof
Relationship To Other Files¶
Sensors file¶
The sensors file maps physical device identity to canonical sensor keys.
It answers:
- which physical device is usually
/spark/cameras/world/scene_1?
It does not answer the full per-episode question of what was actually recorded.
Calibration results¶
calibration.local.json holds the current solved camera geometry.
It remains the working local results file, but the recorder snapshots the relevant solved values into the manifest so old episodes stay self-describing.