Skip to content

Add binary stream support to task-graph and job-queue#545

Open
sroussey wants to merge 2 commits into
mainfrom
claude/stoic-bell-RLJWT
Open

Add binary stream support to task-graph and job-queue#545
sroussey wants to merge 2 commits into
mainfrom
claude/stoic-bell-RLJWT

Conversation

@sroussey
Copy link
Copy Markdown
Collaborator

@sroussey sroussey commented Jun 3, 2026

Summary

Implements binary streaming support across the task-graph and job-queue packages, enabling efficient handling of large binary outputs (files, images, etc.) without materializing them into memory. Introduces a new "binary" stream mode alongside existing "append", "replace", and "object" modes, with intelligent accumulation decisions and cache-streaming optimization.

Key Changes

Core Types & Helpers (packages/task-graph/src/task/StreamTypes.ts)

  • Added "binary" to StreamMode type union
  • Introduced StreamBinaryDelta type for ordered byte chunks (Uint8Array)
  • Added materializeBinary() helper to concatenate chunks into Blob or ArrayBuffer based on schema format
  • Extended getPortStreamMode() and getStreamingPorts() to recognize binary ports
  • Added getBinaryPortId() to locate the first binary port in a schema
  • Added edgeNeedsAccumulation() to determine if a binary→non-binary edge requires materialization

StreamPump Enhancements (packages/task-graph/src/task-graph/StreamPump.ts)

  • canStreamBinaryToCache() — Static decision method (unit-testable in isolation) that returns true when:
    • Cache supports streaming (supportsStreaming() === true)
    • Task outputs only binary ports
    • No downstream consumer needs the materialized value
  • pipeBinaryToCache() — Assembles binary-delta events into an AsyncIterable<Uint8Array> and pipes to cache's saveOutputStream(), returning { promise, detach } for lifecycle management
  • Modified taskNeedsAccumulation() to skip accumulation when canStreamBinaryToCache() returns true, enabling direct cache ingestion without buffering

StreamProcessor Binary Accumulation (packages/task-graph/src/task/StreamProcessor.ts)

  • Added accumulatedBinary map to collect binary-delta chunks during streaming
  • On finish event, materializes accumulated binary chunks into Blob or ArrayBuffer per schema format
  • Explicit binary payload in finish event takes precedence over accumulated chunks

Cache & Repository Updates

  • TaskOutputRepository — Added optional saveOutputStream() method for streaming sinks; supportsStreaming() reflects presence of this method
  • RunPrivateCacheRepo — Forwards saveOutputStream() calls to backing repository with run-ID namespacing
  • CacheCoordinator — Added saveByStream() method (stub for Spec 2 integration with TaskRunner)

Job Queue Stream Support (packages/job-queue/src/job/)

  • JobQueueEventListeners — Added job_stream event type and JobStreamListener callback
  • JobQueueClient — Added onJobStream() subscription method and jobStreamListeners map; JobHandle now has optional onStream() method (present only on server-attached handles)
  • JobQueueWorker — Forwards emitStreamEvent() calls from job context to server's job_stream event
  • JobQueueServer — Broadcasts job_stream events to attached clients via forwardToClients()
  • Job — Added emitStreamEvent() context hook for jobs to emit stream events during execution

Test Coverage

Binary Stream Tests (packages/test/src/test/task-graph/)

  • StreamBinaryPump.test.ts — Comprehensive suite covering:
    • C1 (regression): Binary source → non-binary consumer materializes across edge
    • C2 (decision): canStreamBinaryToCache() returns true/false in isolation
    • C2 (accumulation): Real graph runs materialize bytes when decision is "accumulate"
    • Assembly: pipeBinaryToCache() feeds chunks to cache and resolves on stream end
    • Port filtering and binary port ID resolution
    • Listener detachment and timeout guards
  • **`

https://claude.ai/code/session_01EQqii18C8KWqix8fayyL7X

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 62.12% 24997 / 40238
🔵 Statements 61.98% 25862 / 41720
🔵 Functions 63.11% 4722 / 7482
🔵 Branches 50.81% 12284 / 24172
File CoverageNo changed files found.
Generated in workflow #2486 for commit 184128f by the Vitest Coverage Report Action

Comment thread packages/task-graph/src/cache/resolveRef.ts Fixed
@sroussey sroussey force-pushed the claude/stoic-bell-RLJWT branch from dac4185 to afe3943 Compare June 4, 2026 05:37
claude added 2 commits June 4, 2026 05:48
Spec 1 — binary-delta streaming framework
-----------------------------------------
Adds a `binary-delta` variant to `StreamEvent` (analogous to `text-delta` /
`object-delta`) plus an `x-stream: "binary"` annotation on output port
schemas, so a task can `executeStream` byte chunks the same way it streams
text or structured objects. New port helpers (`getBinaryPortId`,
`getBinaryPortFormat`, `getStreamingPorts`), a `materializeBinary`
assembler (Blob for `format: "blob"`/absent, ArrayBuffer for
`format: "binary"`), and a `getOutputStreamMode` adopter let downstream
code branch cleanly on binary mode without reaching for `any`.

StreamProcessor accumulates `binary-delta` chunks per port and merges
them into the enriched finish event so downstream dataflows see the
materialized payload (or, for explicit binary finish payloads, the
artifact wins per Spec 1's precedence rule).

StreamPump adds the graph-aware decision (`canStreamBinaryToCache`,
`anyConsumerNeedsMaterialized`) and the `pipeBinaryToCache` assembly
helper that turns a task's `binary-delta` events into an `AsyncIterable`
ready to drive a streaming cache sink.

`TaskOutputRepository` gains an optional `saveOutputStream` sink so
file-backed (or other stream-capable) caches can ingest bytes without
materializing the full payload; `supportsStreaming()` and the
`RunPrivateCacheRepo` wrapper forward the capability correctly.

Spec 2 — result-as-reference
----------------------------
Builds on Spec 1 to close the queue-row-bloat hole: when the cache backing
supports streaming, the runner pipes the binary bytes straight to the
cache and places a `CacheRef` placeholder in `Output` at the port slot.
Downstream `Output` consumers (and the queue row) see a small envelope
(`{ \$ref, size?, mime? }`) instead of the full payload, while the bytes
live in the cache for hydration on demand.

Pieces:

- `CacheRef` type + `isCacheRef` type guard (`cache/CacheRef.ts`).
- `resolveOutput` walker (`cache/resolveRef.ts`) — pure recursive walker
  that hydrates refs through a caller-supplied resolver. Identity is
  preserved when no descendant matches the optional filter; class
  instances (`Error`, `URL`, custom classes) survive with prototype
  intact; `Map`/`Set` are walked through so nested refs resolve; opaque
  leaves are `Blob`/`ArrayBuffer`/`TypedArray`/`Date`/`RegExp`/`Promise`.
- `resolveJobOutput` queue-boundary bridge (`cache/resolveJobOutput.ts`)
  accepting either a `CacheRefResolver` function or any object exposing
  `getOutputByRef` (`TaskOutputRepository` shape).
- `IRunConfig.referenceThresholdBytes` (default 64 KiB; `0` forces ref
  for every binary output).
- `TaskOutputRepository.saveOutputStream` now returns `Promise<CacheRef>`;
  new `getOutputByRef` / `getOutputStreamByRef` readers complete the
  contract.
- `CacheCoordinator.getBinaryRefSinksByPolicy` derives a per-port
  `BinaryRefSink` map; `hydrateRefsBelowThreshold` rehydrates refs whose
  committed size falls below the configured threshold (schema-restricted
  to binary streaming ports so legitimate `{\$ref: string}` fields in
  non-binary slots are not mistakenly hit against the cache).
- `StreamProcessor` routes `binary-delta` chunks to a `BinaryRefSink`
  via a small `BinaryStreamRouter` producer-consumer pump.
- `TaskRunner` reads the threshold, builds sinks, threads them through
  `StreamProcessor`, and rehydrates below-threshold refs in the post-run
  pass — saveByPolicy then writes the small ref-bearing Output.
- StreamProcessor TEES when both an accumulator and a router exist for
  a port (graph context where the cache can stream AND a downstream
  edge needs materialized bytes): the emitted finish event carries the
  materialized Blob/ArrayBuffer for edge consumers; `finalOutput`
  carries the CacheRef so the queue/cache row stays small.
- `RunPrivateCacheRepo` forwards all three new optional methods,
  mirroring the backing's true capability on the wrapper instance
  (assigning `undefined` when the backing lacks them) so callers
  probing `typeof === "function"` see the truth.

Tests cover binary-delta accumulation + explicit-finish-payload
precedence, port helpers, cache decision + assembly, runner pipe + force-ref +
threshold rehydrate, tee for the graph + materializing-consumer case,
saved-row size + cross-process serialization round-trip + dangling-ref
best-effort, and the walker / `resolveOutput` / `resolveJobOutput`
surface (class instances, Map/Set, sparse-ref filter, concurrency bound,
identity preservation).
Adds a same-process channel so a holder of a `JobHandle` can subscribe
to a running job's stream events (text deltas, object deltas,
binary-delta chunks, snapshot, finish, error, phase) instead of only
the terminal result.

Worker side
-----------
- `IJobExecuteContext` gains an optional `emitStreamEvent(event)` method.
- `JobQueueWorker` plumbs a per-job event emitter through into the
  execute context so a run-fn can call `ctx.emitStreamEvent(...)` to
  publish stream chunks as they're produced.

Server side
-----------
- `JobQueueServer.forwardToClients("handleJobStream", jobId, event)`
  fans the event to every attached client by direct method invocation —
  pure in-memory, no `postMessage`, no serialization, no worker thread.
  The channel is intentionally same-process only; storage-backed cross-
  process clients see state transitions through `subscribeToChanges`
  but receive no incremental stream events.

Client side
-----------
- `JobHandle.onStream(callback)` is exposed only when the client is
  server-attached (`this.server` set); callers branch on
  `typeof handle.onStream === "function"`.
- Each listener invocation is wrapped in try/catch so one throwing
  subscriber does not abort delivery to the rest or break the dispatch.

Tests
-----
- `JobQueueStream.test.ts` proves end-to-end same-process delivery: a
  worker's `emitStreamEvent` calls reach every `JobHandle.onStream`
  listener in order.
- `JobQueueStreamWorker.integration.test.ts` (+ its `.fixture.mjs`)
  validates the underlying Node `worker_threads` transfer mechanism
  the design relies on for any future cross-thread queue host: binary
  chunks emitted from a worker thread transfer (not copy) to the host
  per `WorkerServerBase.extractTransferables`. The docblock spells
  out that this is a Node-primitive validation, NOT a test of the
  current package's behavior — today's queue channel is entirely
  same-process and the test exists as a navigational marker for a
  future hosted-in-thread variant.
@sroussey sroussey force-pushed the claude/stoic-bell-RLJWT branch from afe3943 to 184128f Compare June 4, 2026 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants