feat: record + replay vision pipelines from file logs#2491
feat: record + replay vision pipelines from file logs#2491JosephTLockwood wants to merge 131 commits into
Conversation
check disk space periodically while recording
The cancel HTTP route was the only caller of onCancelReplayRequest; with the ReplayPanel UI gone, no client hits it. VisionModule's internal cancelReplay() is still needed (stop() invokes it to fence the swap-back before tearing the runner down), but it doesn't need an HTTP surface. Why safe: contract item 8 mentions export/delete/nuke endpoints; the optional replay/cancel route was never load-bearing. Replay naturally ends on EOF and on module stop, both of which still work.
FrameLogFormat was a 35-LOC file holding a single 2-line method shared between the writer (FrameRecorder) and reader (FileLogFrameProvider). Move the method + format constant onto FrameRecorder as public static framePath(...); the reader and the two tests import FrameRecorder for that one method. Result: single source of truth (the writer owns the on-disk shape), one fewer file in the diff. Why safe: contract item 1 (frames/<seq>.jpg layout) is identical — same %06d.jpg format string just lives at a new fully-qualified name. The integration test still asserts every frame exists at the expected path.
…ecarReader javadocs Class doc had design-retrospective content (H.264 vs JPEG tradeoff, ~8-10x storage delta, settings-during-EOF caveat, FOV import flow, schema-forward-compat bullet list) that belongs in the PR description, not in the source. Trimmed to one paragraph per class that pins down the on-disk contract and EOF semantics the rest of the code relies on. Why safe: prose-only change; no method signatures or behaviour touched.
The 415-line developer reference at docs/replay.md was committed as a free-standing file under docs/, but docs/source/ (the Sphinx tree that becomes docs.photonvision.org) never linked it. It rendered nowhere and lived only as a source-tree orphan. PR description carries the same architectural overview for reviewers; once this merges the wiki is a better home than a stray markdown file. Source-of-truth javadocs on FrameRecorder / FileLogFrameProvider / JsonResultExporter remain in place. Why safe: no code touched; no published docs page disappears.
- Drop getInputMatMarksConnectedEvenIfIsConnectedNeverCalled: it pinned down a private cameraPropertiesCached flag transition that is purely a defensive detail; the EOF + pacing tests already exercise the full lifecycle. - Trim the three refuses* tests of dead context comments and split the verbose "copy all frames" helper into a one-liner Files.copy. Why safe: contract item 7 (replay accept-set for frames/000000.jpg + metadata.jsonl) is still enforced — all three rejection paths still have a dedicated assertion.
- Merge failsOnMissingSeq + failsOnMissingCaptureNs into one failsOnMissingFields (same code path with different missing field). - Drop failsOnNonNumericField: covered by the same Jackson canConvertToLong check the missing-field tests exercise. - Drop failsOnShortTrailingLine: writer-crash mid-line case the provider's lockstep-with-frames pattern prevents from being reached in practice; the contract under test is "fail loudly with a line number", which the malformed-JSON test already covers. - Drop closeReleasesFileHandle: Windows-specific FD-lock probe that inflates the test for behaviour Java's BufferedReader.close() guarantees. Why safe: the schema-error path still has two tests pinning down (a) missing required fields and (b) malformed JSON, both verifying the 1-based line number is in the message. emptyFileIsEof + handlesTrailingNewline still cover the EOF case.
The package-private 3-arg ctor was a thin alias that hard-coded TssSample.INACTIVE. Tests now call the 4-arg ctor directly with TssSample.INACTIVE when they don't exercise the tss.json hand-off, so the alias is dead. Net -3 LOC on FrameRecorder, +2 LOC per test call site (1 net delta), but one fewer constructor for reviewers to trace. Why safe: contract item 1 (frames + sidecar) and item 6 (release lifecycle) tested via the 4-arg ctor instead; no behaviour change.
TestSettables was a 60-line VisionSourceSettables stub that no test in the suite ever invoked — TestVisionSource.getSettables was never called by getterReturnsTheProviderRegisteredViaSetter, the swap tests, or the replayRecordingDir family. Replace the body with a throwing stub so a future test that actually needs it gets a clear failure mode. Why safe: the seven tests in this file exercise getFrameProvider / setFrameProvider / getReplayRecordingDir directly; settables / VideoMode / FrameStaticProperties live on a code path none of them touch.
The sampled_at_wpi_nt_now_ns column in tss.json was written by FrameRecorder + asserted by FrameRecorderTssSnapshotTest, but no production consumer reads it — JsonResultExporter.readSnapshot only deserialises tss_active_at_record + tss_offset_at_record_ns into OffsetSnapshot. The third TssSample column was the only difference between TssSample and OffsetSnapshot; collapsing it removes a parallel data shape and the per-call NetworkTablesJNI.now() that fed it. Why safe: contract item 2 (capture_ns preservation) is unchanged. The export-time offset still uses tssOffsetAtRecordNs verbatim; the sampled-at column was diagnostic-only.
Collapse the two distinct snapshot-warning branches (missing-snapshot and tss-inactive) into one — they produce the same downstream failure on the AKit consumer side and are best reported with the same message. Drop the inline recordingName local since it's only referenced once. Shorten the method javadoc — the load-bearing contract (provider-based gating, lazy init, close on swap-back) is preserved; the matchedCameraInfo aside is moved into a single sentence. Why safe: no behaviour change. Same snapshot still flows into the exporter; same failure-latch via jsonExporterDisabled.
The standalone-park branch was a 5-line helper with one call site; inlining puts the swap-aware-vs-standalone decision in one place so the EOF semantics are easier to read end-to-end. Early-return the no-callback case to flatten the if-else nest in enterStoppedFiringEofOnce too. Pure refactor. Why safe: contract item 3 (vision thread parks at EOF until interrupted in standalone mode; swap-aware consumers see empty frames) is unchanged — same conditions trigger the same outcome. parksTheVisionThreadAtEof and eofWithoutCallbackPreservesParkingBehaviour still pin both branches.
The comment block on the volatile field had grown to five lines covering memory-model history. Collapse to two lines: the volatile exists because the vision thread reads the field unsynchronised; the synchronised writers in USBFrameProvider.setRecording / release pair with it. Snapshot pattern is documented at the call site. Why safe: prose-only.
The two getInputMat branches had identical 6-line snapshot blocks (read volatile field, null-check, isRecording, recordFrame). Extract to a private helper so the snapshot rationale lives in one place. Why safe: same field-read semantics — single volatile read on entry, local check + use. Contract item 6 (concurrent setRecording cannot orphan a FrameRecorder) is enforced by synchronized setRecording / release, untouched here.
Two regressions on /api/recordings/exportIndividual that surfaced during physical smoke: - The try-with-resources on FileInputStream closed the stream before Javalin's async writer finished, yielding "Stream Closed" 500s client-side. Read the zip bytes upfront and pass a byte[] to ctx.result(), which is synchronous. - The Content-Disposition filename used cameraUniqueName, which is a UUID for matched cameras, giving downloads like c6bce502-..._2026-05-13_04-46-34.zip. Look up the user-set nickname from ConfigManager and sanitize it for filename use; fall back to the UUID only if no nickname is available.
Mirrors the existing onStartReplayRequest handler: takes the same CommonCameraUniqueName body shape, looks up the VisionModule by uniqueName, and forwards to VisionModule.cancelReplay (already implemented in 3151b4f — fires FileLogFrameProvider EOF and the standard swap-back path). The endpoint was dropped in c549e2a as unused, but the upcoming ReplayBanner UI needs it for its Cancel button.
…ON access
Two new read-only endpoints that let the UI surface and download
individual results/<hash>.jsonl without forcing the user to grab the
whole recording zip:
GET /api/recordings/results?camera=X&recording=Y
Returns one JSON row per results/*.jsonl with hash, size, line
count, pipeline_type, tss_active_at_record, and mtime. Parses each
file's header line for the schema metadata. Empty array if the
results/ dir doesn't exist yet.
GET /api/recordings/result?camera=X&recording=Y&hash=Z
Returns the raw .jsonl bytes with Content-Disposition naming the
file <nickname>_<recording>_<hash>.jsonl. Same nickname resolution
+ filename sanitization as exportIndividual.
Both go through PathSafety.safeResolve for path-traversal protection.
VisionModule now tracks replayCurrentFrame + replayTotalFrames +
replayRecordingName alongside the existing isReplaying flag, updated
by the same onProgress hook that already publishes NT topics. A new
ReplayStatus record + getReplayStatus() getter exposes the snapshot.
GET /api/recordings/replay/status walks all VisionModules and returns
an array of {cameraUniqueName, recordingName, currentFrame,
totalFrames} for the ones currently replaying. The upcoming
ReplayBanner Vue component polls this every ~250ms; on a false-edge
transition from "any replaying" to "none" it triggers an
auto-download of the freshest results/<hash>.jsonl. Polling avoids
taking a dependency on PV's NT websocket bridge for what's a thin
display concern.
Polls GET /api/recordings/replay/status on a configurable interval (default 250 ms) and exposes the active list as a reactive ref. Bare catch keeps the last good value across transient network blips so a single 500 doesn't flap the consuming UI. Used by the upcoming ReplayBanner and the RecordingsCard inline preview + auto-download logic. No callers wired in yet. Build/refresh: ./gradlew :photon-server:buildClient cp -r photon-client/dist/* photon-server/build/resources/main/web/ (hard refresh)
Sticky top-of-viewport banner that renders only while at least one camera is replaying. One row per active replay with nickname, recording name, frame counter, and a Cancel button that POSTs to /api/recordings/replay/cancel. Implemented as a v-app-bar (not a sticky div) so Vuetify's layout system reserves vertical space and v-main reflows correctly instead of letting the banner overlay route content. Polls /api/recordings/replay/status via useReplayStatus, so the banner appears/disappears automatically as replays start/stop. Build/refresh: ./gradlew :photon-server:buildClient cp -r photon-client/dist/* photon-server/build/resources/main/web/ (hard refresh)
Cosmetic — collapses the v-progress-linear attribute group to a single line per prettier's printWidth. Brings the file into format-ci compliance.
While a camera's replay is active (per useReplayStatus), render an extra <tr> below its row containing a photon-camera-stream of the camera's Processed output. The user watches replayed frames flow through the live pipeline as the recording plays back. Reuses photon-camera-stream untouched — passes camera-settings + streamType=Processed. The component's onBeforeUnmount sets src=//:0 when the row disappears, so the MJPEG fetch stops cleanly. The <tr> is wrapped in a <template v-for> so the preview row can be a sibling of the data row inside the same v-for iteration. Build/refresh: ./gradlew :photon-server:buildClient cp -r photon-client/dist/* photon-server/build/resources/main/web/ (hard refresh)
Adds a per-row Results column with a chevron toggle. Expanding fetches GET /api/recordings/results?camera=...&recording=... and renders the list (hash prefix, pipeline type, result count, TSS chip, download link per hash). Re-fetches on every open so newly-created tunings appear without a full page reload. Auto-download: useReplayStatus is watched for the false edge — entries that were in active.value last tick but not this tick. For each ended replay, the newest results/<hash>.jsonl downloads automatically via a programmatic anchor (server's Content-Disposition supplies the filename). Toggle is checkbox-controlled and persisted to localStorage (key photonvision.recordings.autoDownloadResults, default on). Updates the preview-row colspan from 7 to 8 to match the new column count. Build/refresh: ./gradlew :photon-server:buildClient cp -r photon-client/dist/* photon-server/build/resources/main/web/ (hard refresh)
NT4 hands a subscriber the topic's last-published value the instant it
attaches, so the first result tee'd to the json exporter at replay
start is the live source's previous frame — not a replay frame. The
swap-back at EOF can emit one more stray frame from the live source
for the same reason. Both leak into the .jsonl as boundary noise that
AKit consumers have to special-case.
Read [firstCaptureNs, lastCaptureNs] from the recording's metadata.jsonl
at exporter open and drop any result whose capture_ns falls outside.
Falls back to no-filter when the sidecar is missing/unreadable (logged),
so a corrupt recording still produces some output.
Tests:
- dropsResultsOutsideFrameWindow: focused on the new filter, verifies
pre/post-window drops and inclusive boundary keeps.
- JsonResultEndToEndTest: now exercises readFrameWindow against a
real metadata.jsonl and asserts the bounds match CAPTURE_NS[].
JavaDoc line-wrap drift collected on six files after recent edits. spotlessApply rewraps long paragraphs and collapses one method-call linebreak in FrameRecorder.sampleTssNow.
|
Heads-up for reviewers: while auditing the diff before opening I flagged two slices of pre-fork code from #2183 that this PR carries forward but doesn't otherwise depend on. Calling them out explicitly so a reviewer can decide whether to keep, finish, or drop them — happy to cut either in a follow-up commit if the answer is "drop": 1. Reserve-recording-space (~310 LOC, 7 files) — robot-code-triggered "delete oldest recordings to free disk for an upcoming match" workflow. Files:
Things to weigh: the match-data sort relies on a Replay does not depend on any of this; if it's dropped, the recorder just stops accepting frames when disk fills (existing low-disk-space gate at 2. Both buckets compile and ship cleanly on their own — happy to land them, drop them, or treat them as a separate follow-up. |

Summary
Adds end-to-end file-log recording and in-place replay to PhotonVision so
robot-side vision pipelines can be tuned against captured frames after a match
without re-running the original camera. Builds on (and supersedes) the recording
work in #2183.
Recording
FrameRecorderwrites a JPEG image-sequence (frames/000000.jpg…) plus ametadata.jsonlsidecar ({"seq":N,"capture_ns":T}) atomically, one frameper file. Chose this over an MP4/AVI container because:
recording/recordingRequestNT topics, orfrom robot code through new
PhotonCamera.setRecording(boolean)/isRecording()(Java, C++, Python).tss.jsonis written alongside the sidecar at recording start, capturingthe time-sync-server offset so per-pipeline JSON exports can later place
capture_nsinto the TSS time base for AKit replay.Replay
FileLogFrameProviderdecodesframes/+metadata.jsonlback into thevision pipeline at the recorded cadence, propagating
capture_nsverbatiminto
Frame.timestampNanosso downstream NT pose observations carry theoriginal capture timestamp.
VisionModule.startReplay(recording)swaps the liveFrameProvideron the runningVisionSource, runs the pipeline against therecorded frames, and swaps back at EOF (or on explicit cancel). No second
VisionModuleinstance, no parallel runner.JsonResultExportertees eachCVPipelineResultto a per-pipeline-hash.jsonfile under the recording directory while a replay is active, so theuser can re-tune pipeline parameters and diff result sets across runs.
NT contract additions
recording,recordingRequest,isReplaying,replayProgressCurrentFrame,replayProgressTotalFrames,replayProgressRecordingName.UI
recordings, inline processed-stream preview during replay, per-row Replay
button + results dropdown with auto-download on replay end.
cameras, polled through a `useReplayStatus` composable against a new
`GET /api/recordings/replay/status` endpoint.
Server
`export`, `delete`, `nuke`) plus replay endpoints (`replay`,
`replay/cancel`, `replay/status`, `results`, `result`). All
user-supplied paths are routed through a new `PathSafety.safeResolve`
helper to block traversal.
Testing
parse-and-pace path, the JSON exporter's TSS-shifting math, and the
end-to-end record → replay → export round-trip (`JsonResultEndToEndTest`,
`FrameRecorderTssSnapshotTest`, `FileLogFrameProviderTest`,
`MetadataSidecarReaderTest`, `VisionSourceFrameProviderSwapTest`, …).
recorded-source enumeration / filter path.
USB camera: record → replay → in-browser preview → JSON download.
Test plan
shows progress, banner appears, results dropdown populates on EOF
`recording` topic and lights up the indicator
under a new hash)
tss.json}` for every recording
Notes for reviewers
/ `setFrameProvider` so every camera type (USB, libcamera, file, test)
shares the same swap surface — no per-source code paths.
attempted-true) because it is called from the NT listener thread; throwing
would propagate into the NT4 listener pool. Same fix applied to
`FileVisionSource` / `LibcameraGpuSource`.