Skip to content

feat: record + replay vision pipelines from file logs#2491

Open
JosephTLockwood wants to merge 131 commits into
PhotonVision:mainfrom
JosephTLockwood:feature/file-log-replay
Open

feat: record + replay vision pipelines from file logs#2491
JosephTLockwood wants to merge 131 commits into
PhotonVision:mainfrom
JosephTLockwood:feature/file-log-replay

Conversation

@JosephTLockwood

Copy link
Copy Markdown
Contributor

Summary

Adds end-to-end file-log recording and in-place replay to PhotonVision so
robot-side vision pipelines can be tuned against captured frames after a match
without re-running the original camera. Builds on (and supersedes) the recording
work in #2183.

Recording

  • FrameRecorder writes a JPEG image-sequence (frames/000000.jpg …) plus a
    metadata.jsonl sidecar ({"seq":N,"capture_ns":T}) atomically, one frame
    per file. Chose this over an MP4/AVI container because:
    • no 2 GB container size cap; truncatable by file count after a crash,
    • random seek for free,
    • no codec compatibility risk (built into every OpenCV build), and
    • a half-flushed file at process death loses one frame, not the recording.
  • Recording is toggled via the recording / recordingRequest NT topics, or
    from robot code through new PhotonCamera.setRecording(boolean) /
    isRecording() (Java, C++, Python).
  • A tss.json is written alongside the sidecar at recording start, capturing
    the time-sync-server offset so per-pipeline JSON exports can later place
    capture_ns into the TSS time base for AKit replay.

Replay

  • New FileLogFrameProvider decodes frames/ + metadata.jsonl back into the
    vision pipeline at the recorded cadence, propagating capture_ns verbatim
    into Frame.timestampNanos so downstream NT pose observations carry the
    original capture timestamp.
  • Replay is in-place: VisionModule.startReplay(recording) swaps the live
    FrameProvider on the running VisionSource, runs the pipeline against the
    recorded frames, and swaps back at EOF (or on explicit cancel). No second
    VisionModule instance, no parallel runner.
  • A JsonResultExporter tees each CVPipelineResult to a per-pipeline-hash
    .json file under the recording directory while a replay is active, so the
    user can re-tune pipeline parameters and diff result sets across runs.

NT contract additions

  • Per-camera subtable: recording, recordingRequest, isReplaying,
    replayProgressCurrentFrame, replayProgressTotalFrames,
    replayProgressRecordingName.

UI

  • New `RecordingsCard.vue` on the General Settings page: per-camera list of
    recordings, inline processed-stream preview during replay, per-row Replay
    button + results dropdown with auto-download on replay end.
  • New top-of-app `ReplayBanner.vue` showing live replay progress across all
    cameras, polled through a `useReplayStatus` composable against a new
    `GET /api/recordings/replay/status` endpoint.

Server

  • Recording lifecycle endpoints (`exportIndividual`, `exportCamera`,
    `export`, `delete`, `nuke`) plus replay endpoints (`replay`,
    `replay/cancel`, `replay/status`, `results`, `result`). All
    user-supplied paths are routed through a new `PathSafety.safeResolve`
    helper to block traversal.

Testing

  • New JUnit coverage for the recording → sidecar invariant, the provider's
    parse-and-pace path, the JSON exporter's TSS-shifting math, and the
    end-to-end record → replay → export round-trip (`JsonResultEndToEndTest`,
    `FrameRecorderTssSnapshotTest`, `FileLogFrameProviderTest`,
    `MetadataSidecarReaderTest`, `VisionSourceFrameProviderSwapTest`, …).
  • Existing `VisionSourceManagerEnumerationTest` extended to cover the
    recorded-source enumeration / filter path.
  • Smoke-tested locally on WSL2 Linux (the deployment target) against a real
    USB camera: record → replay → in-browser preview → JSON download.

Test plan

  • CI green across all platforms
  • Record from a USB camera, stop, replay through the per-row button — UI
    shows progress, banner appears, results dropdown populates on EOF
  • `PhotonCamera.setRecording(true)` from a robot project flips the
    `recording` topic and lights up the indicator
  • Replay survives a pipeline parameter change mid-replay (results re-export
    under a new hash)
  • Export-all zip contains `//{frames/, metadata.jsonl,
    tss.json}` for every recording

Notes for reviewers

  • Provider-swap entry/exit is gated through `VisionSource.getFrameProvider`
    / `setFrameProvider` so every camera type (USB, libcamera, file, test)
    shares the same swap surface — no per-source code paths.
  • `FileLogFrameProvider.setRecording` is a deliberate no-op (with a warn on
    attempted-true) because it is called from the NT listener thread; throwing
    would propagate into the NT4 listener pool. Same fix applied to
    `FileVisionSource` / `LibcameraGpuSource`.
  • The JPEG quality knob is 85 (~50–150 KB/frame at 1080p). Not exposed yet.

The cancel HTTP route was the only caller of onCancelReplayRequest;
with the ReplayPanel UI gone, no client hits it. VisionModule's
internal cancelReplay() is still needed (stop() invokes it to fence
the swap-back before tearing the runner down), but it doesn't need an
HTTP surface.

Why safe: contract item 8 mentions export/delete/nuke endpoints; the
optional replay/cancel route was never load-bearing. Replay naturally
ends on EOF and on module stop, both of which still work.
FrameLogFormat was a 35-LOC file holding a single 2-line method shared
between the writer (FrameRecorder) and reader (FileLogFrameProvider).
Move the method + format constant onto FrameRecorder as public static
framePath(...); the reader and the two tests import FrameRecorder for
that one method.

Result: single source of truth (the writer owns the on-disk shape),
one fewer file in the diff.

Why safe: contract item 1 (frames/<seq>.jpg layout) is identical —
same %06d.jpg format string just lives at a new fully-qualified name.
The integration test still asserts every frame exists at the expected
path.
…ecarReader javadocs

Class doc had design-retrospective content (H.264 vs JPEG tradeoff,
~8-10x storage delta, settings-during-EOF caveat, FOV import flow,
schema-forward-compat bullet list) that belongs in the PR description,
not in the source. Trimmed to one paragraph per class that pins down
the on-disk contract and EOF semantics the rest of the code relies on.

Why safe: prose-only change; no method signatures or behaviour
touched.
The 415-line developer reference at docs/replay.md was committed as a
free-standing file under docs/, but docs/source/ (the Sphinx tree
that becomes docs.photonvision.org) never linked it. It rendered
nowhere and lived only as a source-tree orphan.

PR description carries the same architectural overview for reviewers;
once this merges the wiki is a better home than a stray markdown
file. Source-of-truth javadocs on FrameRecorder / FileLogFrameProvider
/ JsonResultExporter remain in place.

Why safe: no code touched; no published docs page disappears.
- Drop getInputMatMarksConnectedEvenIfIsConnectedNeverCalled: it
  pinned down a private cameraPropertiesCached flag transition that
  is purely a defensive detail; the EOF + pacing tests already
  exercise the full lifecycle.
- Trim the three refuses* tests of dead context comments and split
  the verbose "copy all frames" helper into a one-liner Files.copy.

Why safe: contract item 7 (replay accept-set for frames/000000.jpg +
metadata.jsonl) is still enforced — all three rejection paths still
have a dedicated assertion.
- Merge failsOnMissingSeq + failsOnMissingCaptureNs into one
  failsOnMissingFields (same code path with different missing field).
- Drop failsOnNonNumericField: covered by the same Jackson
  canConvertToLong check the missing-field tests exercise.
- Drop failsOnShortTrailingLine: writer-crash mid-line case the
  provider's lockstep-with-frames pattern prevents from being reached
  in practice; the contract under test is "fail loudly with a line
  number", which the malformed-JSON test already covers.
- Drop closeReleasesFileHandle: Windows-specific FD-lock probe that
  inflates the test for behaviour Java's BufferedReader.close()
  guarantees.

Why safe: the schema-error path still has two tests pinning down
(a) missing required fields and (b) malformed JSON, both verifying
the 1-based line number is in the message. emptyFileIsEof +
handlesTrailingNewline still cover the EOF case.
The package-private 3-arg ctor was a thin alias that hard-coded
TssSample.INACTIVE. Tests now call the 4-arg ctor directly with
TssSample.INACTIVE when they don't exercise the tss.json hand-off,
so the alias is dead. Net -3 LOC on FrameRecorder, +2 LOC per test
call site (1 net delta), but one fewer constructor for reviewers to
trace.

Why safe: contract item 1 (frames + sidecar) and item 6 (release
lifecycle) tested via the 4-arg ctor instead; no behaviour change.
TestSettables was a 60-line VisionSourceSettables stub that no test
in the suite ever invoked — TestVisionSource.getSettables was never
called by getterReturnsTheProviderRegisteredViaSetter, the swap
tests, or the replayRecordingDir family. Replace the body with a
throwing stub so a future test that actually needs it gets a clear
failure mode.

Why safe: the seven tests in this file exercise getFrameProvider /
setFrameProvider / getReplayRecordingDir directly; settables /
VideoMode / FrameStaticProperties live on a code path none of them
touch.
The sampled_at_wpi_nt_now_ns column in tss.json was written by
FrameRecorder + asserted by FrameRecorderTssSnapshotTest, but no
production consumer reads it — JsonResultExporter.readSnapshot
only deserialises tss_active_at_record + tss_offset_at_record_ns
into OffsetSnapshot.

The third TssSample column was the only difference between TssSample
and OffsetSnapshot; collapsing it removes a parallel data shape and
the per-call NetworkTablesJNI.now() that fed it.

Why safe: contract item 2 (capture_ns preservation) is unchanged.
The export-time offset still uses tssOffsetAtRecordNs verbatim; the
sampled-at column was diagnostic-only.
Collapse the two distinct snapshot-warning branches (missing-snapshot
and tss-inactive) into one — they produce the same downstream
failure on the AKit consumer side and are best reported with the
same message. Drop the inline recordingName local since it's only
referenced once. Shorten the method javadoc — the load-bearing
contract (provider-based gating, lazy init, close on swap-back) is
preserved; the matchedCameraInfo aside is moved into a single
sentence.

Why safe: no behaviour change. Same snapshot still flows into the
exporter; same failure-latch via jsonExporterDisabled.
The standalone-park branch was a 5-line helper with one call site;
inlining puts the swap-aware-vs-standalone decision in one place so
the EOF semantics are easier to read end-to-end. Early-return the
no-callback case to flatten the if-else nest in
enterStoppedFiringEofOnce too. Pure refactor.

Why safe: contract item 3 (vision thread parks at EOF until
interrupted in standalone mode; swap-aware consumers see empty
frames) is unchanged — same conditions trigger the same outcome.
parksTheVisionThreadAtEof and eofWithoutCallbackPreservesParkingBehaviour
still pin both branches.
The comment block on the volatile field had grown to five lines
covering memory-model history. Collapse to two lines: the volatile
exists because the vision thread reads the field unsynchronised; the
synchronised writers in USBFrameProvider.setRecording / release pair
with it. Snapshot pattern is documented at the call site.

Why safe: prose-only.
The two getInputMat branches had identical 6-line snapshot blocks
(read volatile field, null-check, isRecording, recordFrame). Extract
to a private helper so the snapshot rationale lives in one place.

Why safe: same field-read semantics — single volatile read on entry,
local check + use. Contract item 6 (concurrent setRecording cannot
orphan a FrameRecorder) is enforced by synchronized setRecording /
release, untouched here.
Two regressions on /api/recordings/exportIndividual that surfaced during
physical smoke:

- The try-with-resources on FileInputStream closed the stream before
  Javalin's async writer finished, yielding "Stream Closed" 500s
  client-side. Read the zip bytes upfront and pass a byte[] to
  ctx.result(), which is synchronous.

- The Content-Disposition filename used cameraUniqueName, which is a
  UUID for matched cameras, giving downloads like
  c6bce502-..._2026-05-13_04-46-34.zip. Look up the user-set nickname
  from ConfigManager and sanitize it for filename use; fall back to the
  UUID only if no nickname is available.
Mirrors the existing onStartReplayRequest handler: takes the same
CommonCameraUniqueName body shape, looks up the VisionModule by
uniqueName, and forwards to VisionModule.cancelReplay (already
implemented in 3151b4f — fires FileLogFrameProvider EOF and the
standard swap-back path).

The endpoint was dropped in c549e2a as unused, but the upcoming
ReplayBanner UI needs it for its Cancel button.
…ON access

Two new read-only endpoints that let the UI surface and download
individual results/<hash>.jsonl without forcing the user to grab the
whole recording zip:

  GET /api/recordings/results?camera=X&recording=Y
    Returns one JSON row per results/*.jsonl with hash, size, line
    count, pipeline_type, tss_active_at_record, and mtime. Parses each
    file's header line for the schema metadata. Empty array if the
    results/ dir doesn't exist yet.

  GET /api/recordings/result?camera=X&recording=Y&hash=Z
    Returns the raw .jsonl bytes with Content-Disposition naming the
    file <nickname>_<recording>_<hash>.jsonl. Same nickname resolution
    + filename sanitization as exportIndividual.

Both go through PathSafety.safeResolve for path-traversal protection.
VisionModule now tracks replayCurrentFrame + replayTotalFrames +
replayRecordingName alongside the existing isReplaying flag, updated
by the same onProgress hook that already publishes NT topics. A new
ReplayStatus record + getReplayStatus() getter exposes the snapshot.

GET /api/recordings/replay/status walks all VisionModules and returns
an array of {cameraUniqueName, recordingName, currentFrame,
totalFrames} for the ones currently replaying. The upcoming
ReplayBanner Vue component polls this every ~250ms; on a false-edge
transition from "any replaying" to "none" it triggers an
auto-download of the freshest results/<hash>.jsonl. Polling avoids
taking a dependency on PV's NT websocket bridge for what's a thin
display concern.
Polls GET /api/recordings/replay/status on a configurable interval
(default 250 ms) and exposes the active list as a reactive ref. Bare
catch keeps the last good value across transient network blips so a
single 500 doesn't flap the consuming UI.

Used by the upcoming ReplayBanner and the RecordingsCard inline preview
+ auto-download logic. No callers wired in yet.

Build/refresh:
  ./gradlew :photon-server:buildClient
  cp -r photon-client/dist/* photon-server/build/resources/main/web/
  (hard refresh)
Sticky top-of-viewport banner that renders only while at least one
camera is replaying. One row per active replay with nickname,
recording name, frame counter, and a Cancel button that POSTs to
/api/recordings/replay/cancel.

Implemented as a v-app-bar (not a sticky div) so Vuetify's layout
system reserves vertical space and v-main reflows correctly instead
of letting the banner overlay route content.

Polls /api/recordings/replay/status via useReplayStatus, so the
banner appears/disappears automatically as replays start/stop.

Build/refresh:
  ./gradlew :photon-server:buildClient
  cp -r photon-client/dist/* photon-server/build/resources/main/web/
  (hard refresh)
Cosmetic — collapses the v-progress-linear attribute group to a single
line per prettier's printWidth. Brings the file into format-ci compliance.
While a camera's replay is active (per useReplayStatus), render an
extra <tr> below its row containing a photon-camera-stream of the
camera's Processed output. The user watches replayed frames flow
through the live pipeline as the recording plays back.

Reuses photon-camera-stream untouched — passes camera-settings +
streamType=Processed. The component's onBeforeUnmount sets src=//:0
when the row disappears, so the MJPEG fetch stops cleanly.

The <tr> is wrapped in a <template v-for> so the preview row can be
a sibling of the data row inside the same v-for iteration.

Build/refresh:
  ./gradlew :photon-server:buildClient
  cp -r photon-client/dist/* photon-server/build/resources/main/web/
  (hard refresh)
Adds a per-row Results column with a chevron toggle. Expanding fetches
GET /api/recordings/results?camera=...&recording=... and renders the
list (hash prefix, pipeline type, result count, TSS chip, download
link per hash). Re-fetches on every open so newly-created tunings
appear without a full page reload.

Auto-download: useReplayStatus is watched for the false edge — entries
that were in active.value last tick but not this tick. For each ended
replay, the newest results/<hash>.jsonl downloads automatically via
a programmatic anchor (server's Content-Disposition supplies the
filename). Toggle is checkbox-controlled and persisted to localStorage
(key photonvision.recordings.autoDownloadResults, default on).

Updates the preview-row colspan from 7 to 8 to match the new column
count.

Build/refresh:
  ./gradlew :photon-server:buildClient
  cp -r photon-client/dist/* photon-server/build/resources/main/web/
  (hard refresh)
NT4 hands a subscriber the topic's last-published value the instant it
attaches, so the first result tee'd to the json exporter at replay
start is the live source's previous frame — not a replay frame. The
swap-back at EOF can emit one more stray frame from the live source
for the same reason. Both leak into the .jsonl as boundary noise that
AKit consumers have to special-case.

Read [firstCaptureNs, lastCaptureNs] from the recording's metadata.jsonl
at exporter open and drop any result whose capture_ns falls outside.
Falls back to no-filter when the sidecar is missing/unreadable (logged),
so a corrupt recording still produces some output.

Tests:
  - dropsResultsOutsideFrameWindow: focused on the new filter, verifies
    pre/post-window drops and inclusive boundary keeps.
  - JsonResultEndToEndTest: now exercises readFrameWindow against a
    real metadata.jsonl and asserts the bounds match CAPTURE_NS[].
JavaDoc line-wrap drift collected on six files after recent edits.
spotlessApply rewraps long paragraphs and collapses one method-call
linebreak in FrameRecorder.sampleTssNow.
@JosephTLockwood JosephTLockwood requested a review from a team as a code owner May 15, 2026 22:36
@github-actions github-actions Bot added frontend Having to do with PhotonClient and its related items photonlib Things related to the PhotonVision library backend Things relating to photon-core and photon-server labels May 15, 2026
@JosephTLockwood

Copy link
Copy Markdown
Contributor Author

Heads-up for reviewers: while auditing the diff before opening I flagged two slices of pre-fork code from #2183 that this PR carries forward but doesn't otherwise depend on. Calling them out explicitly so a reviewer can decide whether to keep, finish, or drop them — happy to cut either in a follow-up commit if the answer is "drop":

1. Reserve-recording-space (~310 LOC, 7 files) — robot-code-triggered "delete oldest recordings to free disk for an upcoming match" workflow. Files:

  • HardwareManager.reserveRecordingSpace(VisionModule[]) + reserveRecordingSpace(double)
  • NetworkTablesManager reserve subscribers + onReserveRecordingSpaceChanged + getMatchData
  • NTDriverStation.compareMatchData + the printMatchData private→public+return-String refactor
  • PhotonUtils.reserveSpace() and PhotonCamera.willRecord() — new public photon-lib API
  • VisionModule.recordingSpaceNeeded() — only called from HardwareManager.reserveRecordingSpace

Things to weigh: the match-data sort relies on a substring(0, 1) heuristic over the printMatchData string ("Event xxx, Match Q12, Replay 0, ..."), which is fragile vs. the structured types the rest of the codebase uses; and reserveSpace() / willRecord() are new public photon-lib methods that lock us into supporting them.

Replay does not depend on any of this; if it's dropped, the recorder just stops accepting frames when disk fills (existing low-disk-space gate at FrameRecorder.MIN_DISK_SPACE_BYTES).

2. RecordingStrategy enum (~30 LOC, 4 files + 3 frontend) — the enum has exactly one value, SNAPSHOTS. FrameRecorder.getSupportedStrategies() returns List.of(SNAPSHOTS). HardwareConfig carries the field through ctor/toString, and the frontend renders a one-item dropdown for it. Pure scaffolding for a hypothetical second strategy; can be reintroduced when that strategy actually lands.

Both buckets compile and ship cleanly on their own — happy to land them, drop them, or treat them as a separate follow-up.

@JosephTLockwood

Copy link
Copy Markdown
Contributor Author

At this point I think the flow is solid and everything is working. I was able to Record a Video with no Apriltag Detection enabled. Run it through repaly and get poses. Then use the replay file AKit replay to correct robot pose. There are a ton of decisions to be made still on what to log, not log, file types, etc... I struggled with it and went through a ton of diffrent version (as you can probably seem from git history) but think it's at least good enough for testing.

If you have a chance to test let me know how it goes. Replay is inside of Settings for now. Only thing I want to do is see if I can simplify anything more. Still not familar with PV library. Probably duplicated stuff.

Quick Rundown on what is being logged in replay file
image

Here is the file I use for the IO implementation

VisionIOPhotonVisionJSONSimple.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Things relating to photon-core and photon-server frontend Having to do with PhotonClient and its related items photonlib Things related to the PhotonVision library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants