Skip to content

feat: progress tracking, error classification, and CLI UI (rebased on n-ary search)#308

Closed
tonyxiao wants to merge 34 commits intotx/nary-search-paginationfrom
tx/progress-on-nary
Closed

feat: progress tracking, error classification, and CLI UI (rebased on n-ary search)#308
tonyxiao wants to merge 34 commits intotx/nary-search-paginationfrom
tx/progress-on-nary

Conversation

@tonyxiao
Copy link
Copy Markdown
Collaborator

Summary

Merges the progress tracking / error classification / CLI UI work from PR #296 onto the n-ary search pagination branch (PR #307), reconciling both feature sets cleanly.

Key reconciliation decisions:

  • Simplified status model: TraceStreamStatus.status narrowed from ['start', 'running', 'complete', 'range_complete'] to ['started', 'complete']. Range completions use a separate range_complete field with lifecycle status: 'started', making lifecycle and ranges orthogonal.
  • Errors orthogonal to lifecycle: Non-global errors emit stream_status: complete after the error trace. Global errors (auth 401/403) return early without complete emission.
  • Merged progress tracking: Both PR 307's completedRanges / mergeRanges utility and PR 296's change-driven emitIfStatusChanged / CatalogMessage / TraceGlobalProgress coexist cleanly.
  • Error classification: globalPermanent vs streamPermanent — only global permanent errors park the Temporal workflow.

What's new (from PR #296):

  • sync-progress-state.ts — pure reducer for processing SyncOutput into EofPayload
  • sync-ui.tsx — Ink/React CLI progress display
  • --progress CLI flag for interactive terminal output
  • CatalogMessage emitted at sync start so UI knows all streams upfront
  • TraceGlobalProgress with cumulative stats across runs
  • renderSyncProgress replaces verbose formatEof in HTTP API

Test plan

  • pnpm build passes cleanly
  • @stripe/sync-protocol — 50/50 tests pass
  • @stripe/sync-source-stripe — 102/102 tests pass
  • @stripe/sync-engine — 231/231 tests pass (Docker-dependent sync.test.ts excluded)
  • @stripe/sync-service — 26/26 tests pass (2 skipped, Docker-dependent)
  • E2E tests with Docker + Stripe keys

Made with Cursor

tonyxiao and others added 30 commits April 16, 2026 22:54
…fied stream status

Simplify stream lifecycle to just started/complete — errors are now orthogonal
(a stream can be complete with errors). Replace interval-based progress emission
with event-driven emission on status transitions. Add cumulative tracking across
runs (record count, request count, elapsed time) persisted via engine state.
Rename TraceProgress → TraceGlobalProgress and rows_per_second → records_per_second
for consistency. Emit catalog as first message so the UI knows all streams upfront.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Extract sync display state as a pure reducer over SyncOutput messages
that accumulates into an EofPayload shape. The renderer is a stateless
function of (EofPayload, catalog) -> string[].

- Add --progress flag to sync command (auto-enabled on TTY)
- Add --base-url flag for QA/non-prod Stripe API endpoints
- Extract sync-progress-state.ts: createSyncDisplayState() reducer +
  renderSyncProgress() renderer with emoji status groups
- Deduplicate: app.ts formatEof now delegates to renderSyncProgress
- Suppress raw source stream_status traces in trackProgress (engine
  re-emits enriched versions)
- Add scripts/test-all-accounts.sh for multi-account testing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
…al errors, system_error reclassification

Source always emits stream_status:complete on error (errors are orthogonal to
lifecycle). Adds 'Unrecognized request URL' to skippable patterns for treasury.
Stream-scoped permanent errors no longer park the entire workflow — only global
permanent errors (bad API key, invalid config) do. Reclassifies system_error
using isRetryableHttpError: 429/5xx/network→transient, everything else→permanent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Committed-By-Agent: claude
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Introduces the protocol design for sync runs, covering:
- start/end messages replacing EOF-based lifecycle
- Engine-managed time ranges with binary search subdivision
- Segment-level progress tracking with merged synced_ranges
- Error levels as discriminated union (global/stream/segment/transient)
- ProgressPayload as the unified shape for run and request stats
- Source state simplified to pure cursors, engine owns range tracking
- Frozen upper bounds via sync_run_id continuations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Update remaining failure_type references to error_level discriminated
union. Fix field names and remove errors from segment example (errors
live on ProgressPayload, not inline on segments).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
…r levels

- Segments are source-internal; engine only tracks completed_ranges
- completed_ranges derived from source_state messages with time_range
- Error levels reduced to global/stream/transient (no segment level)
- SyncError is discriminated union on error_level
- completed_ranges optional on StreamProgress
- Rates (records_per_second, states_per_second) nested under rates field
- source_state messages carry time_range for engine range tracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Describes how the Stripe source manages pagination within engine-assigned
time ranges: initialization, density probing, sub-range splitting, cursor
tracking, resumption, subdivision, and completion signaling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
- Remove density probing — source starts with full range and subdivides
- Rename ranges → remaining (just the work left to do)
- Describe subdivision: split unpaginated portion of a range into N parts
- Full walkthrough example showing subdivision across requests
- Cursor tracking: null = not started, string = resume point, removed = done

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
…flat rates

- EngineState contains run_progress: ProgressPayload (not flattened)
- completed_ranges optional on StreamProgress
- Rates flat on ProgressPayload (records_per_second, states_per_second)
- Remove old sync-lifecycle-source-stripe.md from docs/ root (moved to docs/engine/)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
- max_concurrent_streams (configurable, default 5, capped at catalog size)
- max_requests_per_second (inferred: live=20, test=10)
- max_segments_per_stream (derived: rps / effective_streams)
- Examples showing budget distribution across different scenarios
- Single-stream syncs get the full rate limit budget

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Events endpoint uses the same time_range + remaining model as all other
streams. Live event polling is experimental and opt-in, stored in
source.global separately from backfill cursor logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
remaining describes the work left to do in source state.
max_segments_per_stream is the config limit on fan-out.
Different concepts, different names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
- Subdivision happens between requests, not mid-request
- Source subdivides if a range didn't complete in previous request
- range_complete is a stream_status subtype (Stripe polymorphism pattern)
- Full example shows actual messages emitted across 3 requests
- stream_status uses 'start' not 'started'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Each Stripe API page returns max 100 records. Example now shows
page-by-page pagination with state checkpoints after each page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Engine emits log messages (info/warn/error) alongside progress.
Tolerant processing — anomalies are logged as warnings, not rejected.
Defines the full set of engine log messages by level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Engine: no redundant info logs (progress stream covers that). Only
warn (anomalies) and error (failures).

Source: emits info logs for real-time rps, warn logs for rate limits
and retries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
fields is redundant with json_schema (use schema to express field
projection). system_columns is a destination concern, not protocol.

Also made time_range.gte optional (omit for "from the beginning").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
ConfiguredStream.stream.name → ConfiguredStream.name. No more nesting
of Stream inside ConfiguredStream.

TODO: move metadata (api_version, account_id, live_mode) out of
per-stream into source_config or destination injection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
- StartPayload.state → starting_state
- EndPayload.state → ending_state
- Round-trip is self-documenting: end.ending_state → start.starting_state
- Progress: client generally doesn't need a reducer, but can diff for deltas

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Future: destination can report inserted/updated/deleted counts per
stream (e.g. Postgres upsert). Extension point noted in StreamProgress.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
All three are first-class fields. For now inserted = record_count,
updated = 0, deleted = 0. Destination can report real values when
it supports upsert/delete tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
change_count = total records processed (always known by engine).
insert_count + update_count + delete_count = change_count when
destination reports the breakdown. All zero until then.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Either change_count (when breakdown unavailable) or insert_count +
update_count + delete_count (when destination reports). All optional.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
It's a count of record messages, not an interpretation of what they mean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
record_count is always known (engine counts record messages).
insert/update/delete are optional enrichment from the destination.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Reserved for when destinations report per-operation counts.
Not implemented yet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
tonyxiao and others added 4 commits April 17, 2026 07:27
Committed-By-Agent: codex
Co-authored-by: codex <noreply@openai.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
#296

Merge the "massive progress fix" (PR #296) onto the n-ary search pagination
branch (PR #307), reconciling the two feature sets:

Protocol:
- Simplify TraceStreamStatus.status to ['started', 'complete']
- Errors are orthogonal to lifecycle — a stream can be complete with errors
- range_complete is now a separate field on TraceStreamStatus, not a status value
- Add CatalogMessage to SyncOutput, TraceGlobalProgress with cumulative stats

Engine:
- Progress tracker emits catalog upfront, stream_status + global_progress
  pairs on transitions, and tracks completed_ranges via mergeRanges
- New sync-progress-state.ts reducer and sync-ui.tsx Ink/React CLI component
- CLI gains --progress flag for interactive terminal progress display

Service:
- Error classification distinguishes globalPermanent vs streamPermanent
- Only global permanent errors (bad API key, invalid config) park the workflow
- Stream-scoped permanent errors skip the stream on resume

Source:
- Remove label from HttpRetryOptions
- Use isRetryableHttpError for failure type classification
- Emit complete status after non-global errors (errors orthogonal to lifecycle)
- Add 'Unrecognized request URL' to skippable error patterns

Made-with: Cursor
Committed-By-Agent: cursor
Made-with: Cursor
Committed-By-Agent: cursor
Keep the lifecycle docs aligned with the current rollout by preserving `eof` as
the backward-compatible terminal message and moving the explicit `start` / `end`
envelope migration into its own follow-up plan.

Constraint: Preserve current eof-based sync APIs while documenting continuation via has_more
Rejected: Keep start/end in the same lifecycle spec | mixed current behavior with a future migration
Confidence: high
Scope-risk: narrow
Not-tested: Runtime behavior; docs-only changes
Made-with: Cursor
Committed-By-Agent: cursor
@tonyxiao tonyxiao force-pushed the tx/nary-search-pagination branch from 44c421b to 58dfc8a Compare April 20, 2026 02:06
@tonyxiao tonyxiao closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant