Skip to content

feat(api): /sessions filters by type and paginates by cursor#677

Merged
sheepbox8646 merged 17 commits into
mainfrom
feat/split-sse
Jun 19, 2026
Merged

feat(api): /sessions filters by type and paginates by cursor#677
sheepbox8646 merged 17 commits into
mainfrom
feat/split-sse

Conversation

@Menci

@Menci Menci commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

Make GET /bots/{bot_id}/sessions filterable by session type and cursor-paginated. Heartbeat / schedule / subagent sessions are now hidden from the default response so the client never has to scan thousands of system-internal entries to render the sidebar.

This is PR 1 of 3 in the SSE / chat-list refactor chain:

  1. (this PR) Backend: sessions pagination + type filter
  2. Backend SSE protocol split + /messages requires session_id
  3. Frontend rebuild around per-session SSE

API

GET /bots/{bot_id}/sessions now accepts:

query default notes
types chat,discuss,acp_agent comma-separated; uses the new session.UserFacingSessionTypes whitelist server-side
limit 50 clamped to [1, 200]
cursor (empty) opaque base64-url of <updated_at_rfc3339nano>|<id>

Response gains next_cursor:

{ "items": [...], "next_cursor": "MjAyNi0wMy0yNFQxMDo0MTowNC41Mz..." }

The cursor key is (updated_at, id) so pages stay stable when many rows share an updated_at.

What this fixes

On a bot with months of activity, bot_sessions accumulates ~30 heartbeat rows per hour (the heartbeat scheduler). One real instance had 2246 heartbeat sessions on the bot vs 25 chat/discuss — the unfiltered, unpaginated response returned all of them. The client filtered heartbeat out client-side, but every subsequent per-event handler still did sessions.value.some(...) on the full 2271-entry reactive array, which compounded into seconds of frozen UI on busy bots.

After this PR the server returns only the 25 user-facing sessions by default, so sessions.value is small enough that the existing client code stops being O(thousands) per SSE event. The frontend changes that actually use the new query params land in PR 3.

Compatibility

  • Old frontend: stops seeing heartbeat sessions in /sessions (the client filter for them was already there, so no visible change). Loses access to the previously implicit "all sessions" view; nothing reads it.
  • New frontend: opts into pagination by passing cursor.

No database migration; query additions only.

Test plan

  • go test ./internal/session/... ./internal/handlers/... — passes (4 new service tests, 7 new handler tests).
  • Default types filter: GET /sessions returns only chat / discuss / acp_agent.
  • Explicit types: GET /sessions?types=chat returns only chat.
  • Unknown types → 400.
  • Cursor round-trip: fetch first page, follow next_cursor, no duplicates, terminates with empty next_cursor.
  • Limit clamping: limit=999 → clamped to 200.
  • Empty response omits next_cursor.

Notes

  • UserFacingSessionTypes is exported from internal/session so PR 2 can use it on the SSE side without duplicating the whitelist.
  • Old ListByBot / ListByBotAndCreatedByUser service methods are intentionally untouched (no callers remain in this PR's handler, but a follow-up cleanup is safer once PR 2 confirms nothing else reaches for them).

GET /bots/{bot_id}/sessions now accepts:
  - types=chat,discuss,acp_agent (defaults to user-facing types)
  - limit=50 (1..200)
  - cursor=<opaque>

Response adds `next_cursor`. System-internal session types (heartbeat,
schedule, subagent) no longer leak to the default list.

Underlies the SSE refactor that follows.
@Menci Menci changed the title feat: split SSE into per-session messages + bot-wide sessions activity feat(api): /sessions filters by type and paginates by cursor Jun 18, 2026
Menci added 16 commits June 19, 2026 12:52
The sqlc-generated SQLite query mixed `sqlc.slice(types)` with explicit
numbered placeholders (`?N`). At runtime sqlc-sqlite expands the slice
into anonymous `?` placeholders that SQLite auto-numbers from the next
free index — which collides with the explicit `?N` we used for the
cursor and limit binds. The result was either a `datatype mismatch`
failure or a silently misrouted bind, so paging was effectively broken.

The same SQL also compared the cursor (RFC3339Nano) against
`updated_at`, which SQLite stores as `YYYY-MM-DD HH:MM:SS` via
`CURRENT_TIMESTAMP`. Lexicographic comparison ('T' > ' ') means every
row looked "less than" any cursor — keyset pagination devolved into
"start from the head every time".

Both go away by hand-rolling the SQLite shim:

- The query uses only sequential anonymous `?`, so binding order is
  unambiguous regardless of how many types the caller passes.
- The cursor timestamp is funneled through
  `strftime('%Y-%m-%d %H:%M:%S', ?)` so it lines up with the stored
  text format.

A new integration test exercises the SQLite shim against an in-memory
database and walks both pages, covering the placeholder collision and
the cursor-format mismatch in a single check.
The non-admin path filtered the DB result by `canAccessSession` and
then derived `next_cursor` from the filtered slice. If a full page of
DB rows was discarded by the filter, `len(sessions) < limit` ⇒
empty `next_cursor` ⇒ the caller stopped paging even though older
rows still existed on disk. The cursor's job is to record the DB
position to resume from; filter survivorship has nothing to do with it.

next_cursor now always reflects the DB resume position when the DB
returned a full page, regardless of whether the permission filter
shrank that page.

Also tightens cursor decoding to reject a syntactically-valid base64
payload whose UUID portion does not parse, so the failure surfaces as
a 400 from the handler instead of a 500 from the service layer.
The new paged /sessions endpoint orders by `updated_at DESC, id DESC`
with `bot_id = ? AND deleted_at IS NULL`. None of the existing
bot_sessions indexes cover that access path, so the planner falls back
to a full scan of the bot's sessions plus an in-memory sort — fine on
small fixtures, painful on real installs.

Add a partial composite index on (bot_id, updated_at DESC, id DESC)
filtered to active rows. Updates both baseline migrations and ships an
incremental migration pair for environments already on the schema.
`toSessionFromPagedRow` and `toSessionFromUserPagedRow` were
byte-identical projections of two sqlc-generated row structs that
happen to share the same shape. Funnel them through a common
`pagedColumns` struct and a single `sessionFromPagedColumns` helper
so the projection lives in one place.

Express `IsUserFacingType` in terms of `UserFacingSessionTypes` via
`slices.Contains` so the constant stays the single source of truth
for which session types are user-facing.
- prefer errors.New over fmt.Errorf for static error messages
- close rows.Close errors explicitly in the hand-rolled shim
- drop unused method receiver on the test grants stub
A SessionCursor with only the timestamp or only the id populated cannot
position a keyset query, but the previous AND-based IsZero let such a
half-built cursor through into pagedCursorParams where the empty half
became a malformed binding. The HTTP decoder already rejects empty
halves, so this only affects internal callers that construct a cursor
by hand; flip the predicate to OR so every callsite funnels through the
no-cursor path instead.
Drop the strftime() wrapping in the paged session SQL: it silently
truncated sub-second precision from the Go side, while the Go formatter
was producing nanoseconds — confusing to read and pointless to compute
twice. Format the cursor timestamp directly at second precision so it
matches SQLite's TEXT storage from CURRENT_TIMESTAMP.

Also make parseSQLiteTimestamp return its error rather than collapse
unknown formats to an invalid Timestamptz; format drift should surface
loudly per the project's no-fake-robustness rule. Callers wrap with
the offending column name and raw text.
RFC3339Nano on the wire is preserved by Postgres but truncated to second
precision by SQLite; document that here so future readers don't try to
"fix" the asymmetry.
The existing paged test seeds rows a minute apart, so the
(updated_at = ? AND id < ?) branch of the cursor SQL was never
exercised. Add a case that inserts three rows sharing one
`updated_at`, pages through them with limit 2, and asserts page 2
returns the remaining row with no duplicates and no skips — that's
the exact bug the id tiebreak exists to prevent.
golangci-lint perfsprint catches fmt.Errorf with no args; the message
is static so errors.New is the right primitive.
…inel

Cycle-3 review nits:
- scanSessionPagedRows redundantly called rows.Close() while the two
  callers already deferred it. Dropped the explicit close so the
  defer is the single close path; future callers that follow the
  same defer pattern won't accidentally double-close.
- The "unsupported timestamp format" error was constructed with
  errors.New on every parse failure even though the message is
  static. Hoisted to a package-level sentinel errUnsupportedSQLiteTimestamp
  so callers can errors.Is-check and the lint flag goes away.
The keyset cursor in sessions_paged.go formats its bound timestamp at second
precision to match SQLite's CURRENT_TIMESTAMP TEXT storage. That coupling was
implicit; a future migration that upgraded bot_sessions.updated_at to
sub-second precision would have silently broken paging.

Pin the contract on both sides:
- Migration baseline now carries a comment next to bot_sessions.updated_at
  explaining the precision invariant and the cursor-formatter dependency.
- pagedCursorBindings comment now references the migration invariant and
  spells out the failure mode if either side changes alone.
- Drop the errUnsupportedSQLiteTimestamp sentinel — no caller errors.Is's
  it, so the indirection only added a moving piece.
- New TestSQLiteListSessionsByBotPagedWithinOneSecond walks paged listings
  across rows whose CURRENT_TIMESTAMP collapses to the same second; every
  row must be visited exactly once via the (updated_at, id) tiebreak.
- New TestSQLiteListSessionsByBotPagedSurfacesRouteJoinColumns asserts that
  route_metadata and route_conversation_type round-trip through the LEFT
  JOIN, locking down the 14-column scan order against silent reorderings.
Tighten the paged session list endpoint along several fronts that all touch
the same flow:

- Has-more without a tail request. The handler asks the SQL layer for
  limit+1 rows and returns items[:limit]; next_cursor fires when the extra
  row exists, so a client paging through an exact-multiple of `limit`
  no longer issues a useless final request that returns []. The service
  signature stays (..., limit int64) and the handler documents the +1 bump
  at the call site.
- Limit pipeline at int64 end-to-end. parseSessionLimitParam now returns
  int64; the service narrows once at the sqlc binding boundary via
  pagedLimitToInt32, which fails loudly if the caller hands it an
  out-of-range value. The //nolint:gosec on the old int32 cast is gone.
- SQLite limit binding cast to int64 to match the rest of the package's
  driver bindings.
- SessionCursor.IsZero() now means "neither half set". A partial cursor
  (only id or only updated_at) was previously treated like the zero value,
  which silently restarted pagination from the head and returned a
  duplicate page. pagedCursorParams now rejects the partial state as a
  programmer error — the handler-side decoder rejects 400 inputs already,
  so any partial state at the service boundary is internal-construction
  bug.
- listSessionsResponse comment now documents the empty-next_cursor
  end-of-stream contract instead of paraphrasing the type name.
- UserFacingSessionTypes is now a function returning a fresh slice rather
  than an exported mutable package-level slice, so callers can't poison
  the source of truth.

Tests cover the new partial-cursor rejection, the IsZero contract, and
update the handler tests to drive the +1 has-more probe.
…riant

Initialize the sessions slice to an empty value so the JSON response
carries `"items": []` when a page is empty, sparing clients a null check.
Cover the behavior with a handler test.

Move the "next_cursor reflects DB position, not filter survivorship"
note onto trimPagedSessions where the invariant is actually enforced;
the non-manager branch no longer needs to restate it.
The 0023 SQLite migration adds a partial composite index on
bot_sessions, which made TestSQLiteModelEnableMigrationPreservesExisting
Models fail under merge-with-main: createPreModelEnableSchema seeded
only providers/models/model_variants, so when the test forced version
to 21 and migrated up, 0023 hit a "no such table: main.bot_sessions"
parse error before it could run.

Seed a minimal bot_sessions stub (id, bot_id, updated_at, deleted_at —
the columns 0023's index references) alongside the existing model
tables. This is fixture upkeep, not a schema change: future migrations
that touch other tables may need the same treatment.
@sheepbox8646 sheepbox8646 merged commit 13ee542 into main Jun 19, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants