Skip to content

fix(telegram): surface docker-isolation session id + isolation immediately (#1946)#1948

Merged
konard merged 8 commits into
mainfrom
issue-1946-94305ac66541
Jun 19, 2026
Merged

fix(telegram): surface docker-isolation session id + isolation immediately (#1946)#1948
konard merged 8 commits into
mainfrom
issue-1946-94305ac66541

Conversation

@konard

@konard konard commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the first-class part of #1946: when the Telegram bot launches a task with
--isolation docker, the session UUID and isolation backend are now surfaced —
and the session tracked — immediately, before the (potentially hour-long)
docker image pull / container startup finishes. Previously they only appeared
once the blocking start-command launch returned, so for the whole startup
window the run showed an info-less 🔄 Starting... and was not addressable by
/watch, /log or /status
(the operator had to fall back to $ --list to
discover the id). This is why --isolation screen felt fine but docker did not:
screen launches in milliseconds, docker's multi-GB pull made the same
post-launch timing pathological.

A full timeline, requirements list (R1–R10), root-cause analysis with log
evidence, alternatives considered, and solution plans are in
docs/case-studies/issue-1946/.

Root cause

The isolation session UUID is generated locally before start-command launches
the container, but the old flow generated it, then await executeWithIsolation(...) ran to completion, and only on success called
trackSession() and edited the message to the session-aware form. start-command's
detached docker backend does not return until the image is pulled and the
container is running (~1 hour for konard/hive-mind-dind:2.0.6), so the session
stayed invisible and untracked for the entire pull.

What changed

  • src/work-session-formatting.lib.mjsformatStartingWorkSessionMessage()
    optionally renders the Session: and 🔒 Isolation: lines on the
    🔄 Starting... message (backward compatible: with no session it renders
    exactly as before).
  • src/telegram-command-execution.lib.mjsbuildExecuteAndUpdateMessage()
    now, for isolation backends: generates the UUID, builds sessionInfo,
    tracks the session and edits the message to the session-aware Starting…
    form before awaiting the launch
    , then awaits it (detached, so other bot
    commands keep working); on launch failure it untracks the optimistic session
    and clears sessionInfo.
  • src/session-monitor.lib.mjs — new exported untrackSession() drops an
    optimistically-tracked session from the in-memory map and durable store
    without emitting a session_completed audit event (the session never ran).
  • src/telegram-bot.mjs — wires untrackSession +
    formatStartingWorkSessionMessage into the shared execution path, so
    /solve, /hive and /task all inherit the fix (R6).

How to reproduce / verify

Before: launching /solve <url> --isolation docker left 🔄 Starting... with no
session info for the whole image pull; /watch//log//status could not find
the run. After: the Session: + 🔒 Isolation: docker lines appear immediately
and the session is addressable during startup.

Automated regression: tests/test-issue-1946-docker-isolation-async.mjs
proves (via event ordering with a deferred launch promise) that the session is
tracked and shown before the launch resolves, stays tracked on success, and
is untracked on failure. tests/test-issue-1860-docker-isolation.mjs (33
assertions) updated for the new deps and still passes. Full default suite: 274/274
files pass.

Upstream halves — fixed, released, and now pinned

The other two #1946 problems live outside this repo's code. They were reported
upstream, fixed there, released, and this PR pins the fixed versions in
Dockerfile / Dockerfile.dind:

Each upstream issue includes a reproducer, a workaround, and a code-level fix
suggestion (bodies under docs/case-studies/issue-1946/upstream/). Hive Mind's
existing verbose diagnostics (preflightDockerIsolation,
logDockerIsolationPostLaunchDiagnostics) are retained so the next iteration can
confirm those from data.

Closes #1946

Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #1946
@konard konard self-assigned this Jun 19, 2026
konard added 2 commits June 19, 2026 07:04
…ately (#1946)

When the Telegram bot launched a task with --isolation docker, the session
UUID and isolation backend were only shown (and the session only tracked)
AFTER the blocking start-command launch returned. Because the detached docker
backend does not return until the multi-GB image is pulled and the container is
running (~1 hour for konard/hive-mind-dind), the message stayed an info-less
'Starting...' and the run was unaddressable by /watch, /log and /status for the
whole startup window.

- work-session-formatting: formatStartingWorkSessionMessage now optionally
  renders the Session: and Isolation: lines on the Starting... message
  (backward compatible when no session is known).
- telegram-command-execution: buildExecuteAndUpdateMessage now generates the
  UUID, builds sessionInfo, tracks the session and edits the message to the
  session-aware Starting... form BEFORE awaiting the launch; on launch failure
  it untracks the optimistic session and clears sessionInfo.
- session-monitor: new exported untrackSession() removes an optimistically
  tracked session from the in-memory map and durable store without emitting a
  session_completed audit event.
- telegram-bot: wires untrackSession + formatStartingWorkSessionMessage into
  the shared execution path (so /solve, /hive and /task all inherit the fix).

Adds tests/test-issue-1946-docker-isolation-async.mjs (event-ordering proof
that the session is tracked + shown before the launch resolves, and untracked
on failure) and updates tests/test-issue-1860-docker-isolation.mjs deps.

Adds docs/case-studies/issue-1946/ (timeline, requirements, root causes,
solution plans, raw evidence) and reports the upstream image-prep-log gap
(link-foundation/start#138) and host-image re-download (link-foundation/box#106).

Refs #1946
@konard konard changed the title [WIP] --isolation docker issues fix(telegram): surface docker-isolation session id + isolation immediately (#1946) Jun 19, 2026
@konard konard marked this pull request as ready for review June 19, 2026 07:06
@konard

konard commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Working session summary

I'll wait for the background CI-monitor task to notify completion. In the meantime, here's the status of the work.

Summary of work completed

I've resolved the actionable portion of issue #1946 in PR #1948 and filed the upstream issues for the parts that live outside this repository.

Root cause (fixed here): The Telegram bot generated the docker-isolation session UUID locally before launching the container, but only tracked the session and showed the Session:/Isolation: lines after the blocking start-command launch returned. Since the detached docker backend doesn't return until the multi-GB image is pulled (~1 hour for konard/hive-mind-dind:2.0.6), the run stayed an info-less 🔄 Starting... and was unaddressable by /watch, /log, /status for the whole startup window.

Code changes:

  • work-session-formatting.lib.mjsformatStartingWorkSessionMessage() now renders Session: + 🔒 Isolation: on the Starting message (backward compatible).
  • telegram-command-execution.lib.mjs — tracks the session and edits the message to the session-aware form before awaiting the launch; untracks on failure.
  • session-monitor.lib.mjs — new exported untrackSession().
  • telegram-bot.mjs — wires both into the shared path so /solve, /hive, /task all inherit the fix.

Tests: New tests/test-issue-1946-docker-isolation-async.mjs (5/5, proves ordering), updated test-issue-1860 (33/33), full default suite 274/274 files pass, lint clean.

Case study: docs/case-studies/issue-1946/ with timeline, R1–R10 requirements, root causes, alternatives, raw evidence (the 546-byte log 7 minutes in is the smoking gun for the missing image-prep logs).

Upstream issues filed (the parts outside this repo's code):

PR #1948 is updated (title + description), marked ready for review, with main (2.0.7) merged in and a changeset added. The fix commit already passed CI; I'm waiting on the merge commit's run to confirm green.


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $6.664835
  • Calculated by Anthropic: $7.359548
  • Difference: $0.694714 (+10.42%)

📊 Context and tokens usage:

Claude Opus 4.8: (2 sub-sessions)

  1. 116.2K / 1M (12%) input tokens, 42.1K / 128K (33%) output tokens
  2. 79.5K / 1M (8%) input tokens, 18.6K / 128K (15%) output tokens

Total: (23.0K new + 185.3K cache writes + 6.5M cache reads) input tokens, 67.8K output tokens, $6.219655 cost

Claude Haiku 4.5:

  • 130.9K / 200K (65%) input tokens, 12.6K / 64K (20%) output tokens

Total: (174 new + 130.7K cache writes + 2.2M cache reads) input tokens, 12.6K output tokens, $0.445180 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Main model: Claude Opus 4.8 (claude-opus-4-8)
  • Additional models:
    • Claude Haiku 4.5 (claude-haiku-4-5-20251001)

📎 Log file uploaded as Gist (5388KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard

konard commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Get latest changes from default branch.

Check latest versions of start-command and box, continue fixing, double check consistency and correctness.

@konard konard marked this pull request as draft June 19, 2026 17:59
@konard

konard commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 AI Work Session Started

Starting automated work session at 2026-06-19T17:59:54.576Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

konard added 4 commits June 19, 2026 18:01
 upstream halves)

The two out-of-repo halves of #1946 are now fixed upstream and released:

- start-command 0.29.2 (link-foundation/start#138 / PR #139) records the docker
  image-preparation phase (docker pull / dind boot) in the $ session log, so
  $ --upload-log no longer returns a near-empty 546-byte log while a multi-GB
  image is still pulling. Bumped the pin 0.29.1 -> 0.29.2 in Dockerfile and
  Dockerfile.dind.
- box-dind 2.3.4 (link-foundation/box#106 / PR #107) makes the dind entrypoint
  verify host-image passthrough actually seeded the nested daemon and report
  'finished WITH WARNINGS' instead of silently re-downloading ~30 GB. Bumped
  Dockerfile.dind base konard/box-dind 2.3.2 -> 2.3.5 (latest patch).

Updated the case study Upstream Follow-ups section and the changeset to record
the fixed/released state and the pinned versions.
…ion tests

Dockerfile and coolify/Dockerfile pin konard/box, which the repo convention keeps
in lockstep with the DinD base-image release (asserted by
tests/test-docker-box-migration.mjs). Bumped both 2.3.2 -> 2.3.5 to match the
Dockerfile.dind base bump, and updated the version assertions in
test-docker-dind-variant.mjs (box-dind 2.3.5) and test-docker-box-migration.mjs
(box 2.3.5).
… and pinned

Update the issue-1946 case study's Online-And-Source-Facts and Root-Cause-3
sections so they no longer describe the image-prep log gap and host-image
re-download as still-open: start-command 0.29.2 and box-dind 2.3.4 fixed them and
this repo now pins 0.29.2 / box-dind 2.3.5.
@konard

konard commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Working session summary

I have a background waiter (bl09gto9t) that will notify me when the suite completes. The suite is progressing through the final slow CLI integration tests with everything passing so far. I'll wait for that notification before confirming the final green result and wrapping up.

Summary of the state: all changes are committed and pushed to issue-1946-94305ac66541 (latest commit c182a01a), the two earlier targeted runs (full suite #1 minus the now-fixed version test, plus the 1946/1860/docker tests) are green, and the only outstanding item is confirming this final full-suite run finishes clean after the test-assertion updates.


This summary was automatically extracted from the AI working session output.

@konard konard marked this pull request as ready for review June 19, 2026 18:15
@konard

konard commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $4.589036
  • Calculated by Anthropic: $4.915642
  • Difference: $0.326606 (+7.12%)

📊 Context and tokens usage:

  • 99.8K / 1M (10%) input tokens, 31.1K / 128K (24%) output tokens

Total: (9.2K new + 87.1K cache writes + 6.4M cache reads) input tokens, 31.1K output tokens, $4.589036 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus (claude-opus-4-8)
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (2500KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard konard merged commit 1dca2cc into main Jun 19, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--isolation docker issues

1 participant