Skip to content

feat: add module backend assignment support#1500

Merged
leejet merged 1 commit into
masterfrom
split-backend
May 16, 2026
Merged

feat: add module backend assignment support#1500
leejet merged 1 commit into
masterfrom
split-backend

Conversation

@leejet
Copy link
Copy Markdown
Owner

@leejet leejet commented May 16, 2026

Summary

  • Add --backend and --params-backend module assignment support.
    • Supports module assignments such as te=cpu,vae=cuda0,diffusion=vulkan0.
  • Add SDBackendManager to own and share backend instances by resolved backend name.
  • Pass runtime and parameter backends into GGMLRunner externally.
  • Split ggml_extend_backend.hpp into ggml_extend_backend.h/.cpp.
  • Document backend selection behavior in docs/backend.md.

Related Issue / Discussion

Since #1184, there have been multiple PRs containing backend selection related changes. However, none of them achieved the behavior I expected, and the implementations were intertwined with other unrelated changes. I decided to write a standalone implementation myself that better matches my design goals, while also adding support for parameter-based backend selection.

This PR was inspired by the work in stduphf’s PR. When merging this PR, I will add @stduhpf as a co-author.

Additional Information

  • Ran smoke tests with several model pipelines

Checklist

@leejet leejet merged commit 3633072 into master May 16, 2026
12 checks passed
@leejet leejet deleted the split-backend branch May 16, 2026 13:47
fszontagh added a commit to fszontagh/stable-diffusion.cpp that referenced this pull request May 22, 2026
13 new upstream commits since previous sync at 0b82969. The big one is
leejet#1500 (module backend assignment): ~1.5k LOC churn that splits backend
code into a new ggml_extend_backend.{h,cpp} pair and replaces every
runner's (backend_t backend, bool offload_params_to_cpu) constructor
arg with (backend_t runtime, backend_t params). New CLI flags
--backend te=cpu,vae=cuda0,... and --params-backend te=cpu,vae=cpu,...

Other notable upstream changes folded in:
  3633072 module backend assignment (leejet#1500)
  38b14ad --max-vram -1 auto-detect (leejet#1498)
  67dda3f LTX 2.3 architecture (leejet#1463)
  06accf2 LTXAV latent2rgb projection
  9d68341 Euler/DDIM unification (leejet#1474)
  cde20d5 stereo handling in sd_audio
  d7ecbe1 T5 EOS dedup in Anima
  bd17f53 / 0c1ca17 / 839f6a9 / 3b4d26f ROCm/docs/CI
  db08b84 GCC 16 build fix
  686856e fake-VAE log demotion
  0b82969 / 381e0df PR template + CONTRIBUTING.md

Conflicts:

- examples/common/common.cpp, include/stable-diffusion.h: kept our
  offload_config alongside upstream's new backend/params_backend
  strings. sd_ctx_params_t now carries both axes.

- src/lora.hpp: dropped our enable_offload bool. The new params_backend
  argument expresses the same intent (CPU = offload).

- src/hidream_o1.hpp: kept params_prefix member, switched constructor
  to upstream's (backend, params_backend) signature.

- src/stable-diffusion.cpp: every runner-construction site took
  upstream's backend_for(MODULE) / params_backend_for(MODULE) lookups.
  Removed the dead cond_stage/diffusion/vae_offload_to_cpu local-bool
  derivation; replaced with calls to a new
  SDBackendManager::force_module_params_backend(MODULE, "cpu") helper
  that mutates params_assignment_ after init_backend() runs. The
  offload_config-driven escalations now land in the same data
  structure upstream's --params-backend writes to.

Post-merge fixups surfaced by retesting HiDream O1 streaming:

- src/llm.hpp: TextModel.forward_final_norm now casts to LLMRMSNorm,
  not RMSNorm. Upstream changed the "norm" block's concrete type;
  our pre-merge cast returned nullptr and crashed on first forward().

- src/hidream_o1.hpp: Stage 1 of compute_streaming_true scales
  inputs_embeds by sqrt(hidden_size) when params.llm.normalize_input,
  matching what forward_embeds does. No-op for HiDream O1 today but
  keeps the streaming path drift-free if a future arch flips it.

Smoke-tested on 12 GB GPU:
  Z-Image-Turbo Q8 layer_streaming     -> 4.32 s
  HiDream O1 bf16 dev layer_streaming  -> 17.44 s (4 steps, 1024x1024)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant