fix(telegram): surface docker-isolation session id + isolation immediately (#1946)#1948
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #1946
…ately (#1946) When the Telegram bot launched a task with --isolation docker, the session UUID and isolation backend were only shown (and the session only tracked) AFTER the blocking start-command launch returned. Because the detached docker backend does not return until the multi-GB image is pulled and the container is running (~1 hour for konard/hive-mind-dind), the message stayed an info-less 'Starting...' and the run was unaddressable by /watch, /log and /status for the whole startup window. - work-session-formatting: formatStartingWorkSessionMessage now optionally renders the Session: and Isolation: lines on the Starting... message (backward compatible when no session is known). - telegram-command-execution: buildExecuteAndUpdateMessage now generates the UUID, builds sessionInfo, tracks the session and edits the message to the session-aware Starting... form BEFORE awaiting the launch; on launch failure it untracks the optimistic session and clears sessionInfo. - session-monitor: new exported untrackSession() removes an optimistically tracked session from the in-memory map and durable store without emitting a session_completed audit event. - telegram-bot: wires untrackSession + formatStartingWorkSessionMessage into the shared execution path (so /solve, /hive and /task all inherit the fix). Adds tests/test-issue-1946-docker-isolation-async.mjs (event-ordering proof that the session is tracked + shown before the launch resolves, and untracked on failure) and updates tests/test-issue-1860-docker-isolation.mjs deps. Adds docs/case-studies/issue-1946/ (timeline, requirements, root causes, solution plans, raw evidence) and reports the upstream image-prep-log gap (link-foundation/start#138) and host-image re-download (link-foundation/box#106). Refs #1946
--isolation docker issues
Working session summaryI'll wait for the background CI-monitor task to notify completion. In the meantime, here's the status of the work. Summary of work completedI've resolved the actionable portion of issue #1946 in PR #1948 and filed the upstream issues for the parts that live outside this repository. Root cause (fixed here): The Telegram bot generated the docker-isolation session UUID locally before launching the container, but only tracked the session and showed the Code changes:
Tests: New Case study: Upstream issues filed (the parts outside this repo's code):
PR #1948 is updated (title + description), marked ready for review, with This summary was automatically extracted from the AI working session output. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
📊 Context and tokens usage:Claude Opus 4.8: (2 sub-sessions)
Total: (23.0K new + 185.3K cache writes + 6.5M cache reads) input tokens, 67.8K output tokens, $6.219655 cost Claude Haiku 4.5:
Total: (174 new + 130.7K cache writes + 2.2M cache reads) input tokens, 12.6K output tokens, $0.445180 cost 🤖 Models used:
📎 Log file uploaded as Gist (5388KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
This reverts commit 5628537.
|
Get latest changes from default branch. Check latest versions of start-command and box, continue fixing, double check consistency and correctness. |
|
🤖 AI Work Session Started Starting automated work session at 2026-06-19T17:59:54.576Z The PR has been converted to draft mode while work is in progress. This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback. |
upstream halves) The two out-of-repo halves of #1946 are now fixed upstream and released: - start-command 0.29.2 (link-foundation/start#138 / PR #139) records the docker image-preparation phase (docker pull / dind boot) in the $ session log, so $ --upload-log no longer returns a near-empty 546-byte log while a multi-GB image is still pulling. Bumped the pin 0.29.1 -> 0.29.2 in Dockerfile and Dockerfile.dind. - box-dind 2.3.4 (link-foundation/box#106 / PR #107) makes the dind entrypoint verify host-image passthrough actually seeded the nested daemon and report 'finished WITH WARNINGS' instead of silently re-downloading ~30 GB. Bumped Dockerfile.dind base konard/box-dind 2.3.2 -> 2.3.5 (latest patch). Updated the case study Upstream Follow-ups section and the changeset to record the fixed/released state and the pinned versions.
…ion tests Dockerfile and coolify/Dockerfile pin konard/box, which the repo convention keeps in lockstep with the DinD base-image release (asserted by tests/test-docker-box-migration.mjs). Bumped both 2.3.2 -> 2.3.5 to match the Dockerfile.dind base bump, and updated the version assertions in test-docker-dind-variant.mjs (box-dind 2.3.5) and test-docker-box-migration.mjs (box 2.3.5).
… and pinned Update the issue-1946 case study's Online-And-Source-Facts and Root-Cause-3 sections so they no longer describe the image-prep log gap and host-image re-download as still-open: start-command 0.29.2 and box-dind 2.3.4 fixed them and this repo now pins 0.29.2 / box-dind 2.3.5.
Working session summaryI have a background waiter ( Summary of the state: all changes are committed and pushed to This summary was automatically extracted from the AI working session output. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
📊 Context and tokens usage:
Total: (9.2K new + 87.1K cache writes + 6.4M cache reads) input tokens, 31.1K output tokens, $4.589036 cost 🤖 Models used:
📎 Log file uploaded as Gist (2500KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
Summary
Fixes the first-class part of #1946: when the Telegram bot launches a task with
--isolation docker, the session UUID and isolation backend are now surfaced —and the session tracked — immediately, before the (potentially hour-long)
docker image pull / container startup finishes. Previously they only appeared
once the blocking
start-commandlaunch returned, so for the whole startupwindow the run showed an info-less
🔄 Starting...and was not addressable by/watch,/logor/status(the operator had to fall back to$ --listtodiscover the id). This is why
--isolation screenfelt fine but docker did not:screen launches in milliseconds, docker's multi-GB pull made the same
post-launch timing pathological.
A full timeline, requirements list (R1–R10), root-cause analysis with log
evidence, alternatives considered, and solution plans are in
docs/case-studies/issue-1946/.Root cause
The isolation session UUID is generated locally before start-command launches
the container, but the old flow generated it, then
await executeWithIsolation(...)ran to completion, and only on success calledtrackSession()and edited the message to the session-aware form. start-command'sdetached docker backend does not return until the image is pulled and the
container is running (~1 hour for
konard/hive-mind-dind:2.0.6), so the sessionstayed invisible and untracked for the entire pull.
What changed
src/work-session-formatting.lib.mjs—formatStartingWorkSessionMessage()optionally renders the
Session:and🔒 Isolation:lines on the🔄 Starting...message (backward compatible: with no session it rendersexactly as before).
src/telegram-command-execution.lib.mjs—buildExecuteAndUpdateMessage()now, for isolation backends: generates the UUID, builds
sessionInfo,tracks the session and edits the message to the session-aware Starting…
form before awaiting the launch, then awaits it (detached, so other bot
commands keep working); on launch failure it untracks the optimistic session
and clears
sessionInfo.src/session-monitor.lib.mjs— new exporteduntrackSession()drops anoptimistically-tracked session from the in-memory map and durable store
without emitting a
session_completedaudit event (the session never ran).src/telegram-bot.mjs— wiresuntrackSession+formatStartingWorkSessionMessageinto the shared execution path, so/solve,/hiveand/taskall inherit the fix (R6).How to reproduce / verify
Before: launching
/solve <url> --isolation dockerleft🔄 Starting...with nosession info for the whole image pull;
/watch//log//statuscould not findthe run. After: the
Session:+🔒 Isolation: dockerlines appear immediatelyand the session is addressable during startup.
Automated regression:
tests/test-issue-1946-docker-isolation-async.mjsproves (via event ordering with a deferred launch promise) that the session is
tracked and shown before the launch resolves, stays tracked on success, and
is untracked on failure.
tests/test-issue-1860-docker-isolation.mjs(33assertions) updated for the new deps and still passes. Full default suite: 274/274
files pass.
Upstream halves — fixed, released, and now pinned
The other two #1946 problems live outside this repo's code. They were reported
upstream, fixed there, released, and this PR pins the fixed versions in
Dockerfile/Dockerfile.dind:$session log (the 546-byte log7 minutes in;
$should preserve the full "prepare image + execute" log) →Detached docker session log omits the image-preparation phase (docker pull / dind boot) — $ does not preserve the full log in one file link-foundation/start#138, fixed in
start PR #139, released as
start-command@0.29.2. Pin bumped0.29.1→0.29.2in both Dockerfiles.→ box-dind: nested daemon still re-downloads host images (~30 GB) despite passthrough — continuation of #94 / #102 link-foundation/box#106 (continuation of box Claude command was killed #94/Add --auto-continue-only-on-new-comments option for solve.mjs #102), fixed in
box PR #107, released as
konard/box-dind:2.3.4.Dockerfile.dindbase bumpedkonard/box-dind:2.3.2→2.3.5(latest patch); the dind entrypoint nowverifies passthrough actually seeded the nested daemon and reports
finished WITH WARNINGSinstead of silently re-downloading.Each upstream issue includes a reproducer, a workaround, and a code-level fix
suggestion (bodies under
docs/case-studies/issue-1946/upstream/). Hive Mind'sexisting verbose diagnostics (
preflightDockerIsolation,logDockerIsolationPostLaunchDiagnostics) are retained so the next iteration canconfirm those from data.
Closes #1946