Add CUDA Docker image support#2053
Draft
cameronbergh wants to merge 1 commit into
Draft
Conversation
|
Tested this PR on an NVIDIA DGX Spark running ARM64 Ubuntu 24.04. First, thanks for putting this together. This is the first path I’ve found that gets EXO close to running on the DGX Spark CUDA stack. Environment
Build resultThe image built successfully: docker build -t exo:cuda13-pr2053 .
Runtime issue 1: resources/dashboard path
When running the image directly, EXO failed with:
FileNotFoundError: Unable to locate resources. Did you clone the repo properly?
I was able to get past that by bind-mounting the repo separately and setting EXO_RESOURCES_DIR:
docker run --rm -it \
--gpus all \
--network host \
-e EXO_LIBP2P_NAMESPACE=pulmhealth_cuda_test \
-e EXO_RESOURCES_DIR=/repo \
-v "$PWD":/repo \
-v /mnt/exo-models/models:/root/.local/share/exo/models \
exo:cuda13-pr2053-patched
This suggests the runtime image may not include or locate the dashboard/resources path correctly.
Runtime issue 2: mlx_lm expects new_thread_local_stream
After getting past the resources issue, the runner repeatedly crashed on Linux/CUDA with:
AttributeError: module 'mlx.core' has no attribute 'new_thread_local_stream'
Relevant trace:
File "/app/.venv/lib/python3.13/site-packages/mlx_lm/generate.py", line 226, in <module>
generation_stream = mx.new_thread_local_stream(mx.default_device())
Inside the container:
import mlx.core as mx
print(hasattr(mx, "new_thread_local_stream"))
# False
print([x for x in dir(mx) if "stream" in x.lower()])
# ['Stream', 'StreamContext', 'default_stream', 'new_stream', 'set_default_stream', 'stream']
So the installed Linux/CUDA mlx.core exposes new_stream, but not new_thread_local_stream, while the mlx-lm branch used by this PR expects new_thread_local_stream.
Temporary compatibility shim
This shim allowed mlx_lm to import successfully inside the container:
import mlx.core as mx
if not hasattr(mx, "new_thread_local_stream") and hasattr(mx, "new_stream"):
mx.new_thread_local_stream = mx.new_stream
import mlx_lm
I then patched src/exo/worker/runner/bootstrap.py immediately before importing/applying the MLX patches:
try:
import mlx.core as mx
if not hasattr(mx, "new_thread_local_stream") and hasattr(mx, "new_stream"):
mx.new_thread_local_stream = mx.new_stream
except Exception:
pass
from exo.worker.engines.mlx.patches import apply_mlx_patches
After rebuilding the image with that shim, the previous new_thread_local_stream import crash appears to be resolved.
Suggested fixes
Potential fixes to consider:
Align the Linux/CUDA mlx, mlx-cuda-13, and mlx-lm pins so mlx_lm does not call APIs missing from Linux/CUDA MLX.
Add a small Linux/CUDA compatibility shim for new_thread_local_stream if new_stream is the intended equivalent.
Ensure dashboard/resources are included in the runtime image, or document the required EXO_RESOURCES_DIR and bind-mount pattern.
I’m continuing to test this with multiple DGX Spark nodes in an isolated namespace and can validate another branch or image if helpful. |
Author
|
Holding this as draft until we validate with a real Qwen 4B model on Mac MLX, Linux CUDA, and a mixed Mac+CUDA cluster. |
7b4fa84 to
a864666
Compare
a864666 to
23e0de2
Compare
Author
|
Validation update for
Also fixed two Docker issues discovered during validation:
|
Author
|
Follow-up validation with a longer prompt and longer generation (
Note: the earlier validation comment used a very short |
Winston-9527
pushed a commit
to Winston-9527/exo
that referenced
this pull request
May 22, 2026
This commit enables exo to run on NVIDIA GPUs on Linux by fixing Metal-specific assumptions in the MLX inference path. Changes: - Add CUDA compatibility shim for mlx-lm's new_thread_local_stream API (Linux CUDA MLX exposes new_stream instead) - Gate MLX_METAL_FAST_SYNCH env var to macOS only, preventing warnings on Linux - Make set_wired_limit_for_model handle CUDA backends gracefully by checking mx.metal.is_available() first - Add automatic LD_LIBRARY_PATH setup in runner bootstrap for CUDA libraries (libcublasLt.so.13, etc.) Compatibility: - Zero breaking changes - all modifications are platform-gated - macOS Metal path unchanged - CPU-only Linux still works - Enables heterogeneous clusters (macOS Metal + Linux CUDA) Tested on: - Linux: NVIDIA RTX 3090, CUDA 13.1, Driver 590.48.01 - macOS: MacBook Pro M5 Pro (Metal) - Verified cross-platform cluster inference with Qwen3-0.6B-8bit Refs: PR exo-explore#2053 (Docker support) - this provides the native code changes needed for Linux CUDA deployment without requiring Docker.
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cuda13extra instead of changing default Linux dependenciesNotes
This ports the useful Docker pieces from #1317 onto current
mainwithout making CUDA the default Linux install path. Non-CUDA Linux users should continue to use the existing CPU/default paths.Validation
ruff check .basedpyright --project pyproject.tomlcd dashboard && npm install && npm run buildpytest src→ 410 passed, 1 skipped, 187 deselecteddocker build -t exo:cuda13-pr1317 .completed successfullylibcuda.so.1were exposed manually:default Device(gpu, 0)array [2, 3, 4]Known follow-up
The test host's Docker daemon does not currently have NVIDIA Container Toolkit/CDI configured, so
docker run --gpus all ...fails with:Manual device/library exposure verified that the image itself can import and execute MLX on the RTX 3060 GPU.