Skip to content

refactor(scheduler): get rid of app-level GPU management#19

Merged
dkeven merged 1 commit into
feat/nvsharefrom
scheduler/refactor/rm_app_gpu
May 21, 2026
Merged

refactor(scheduler): get rid of app-level GPU management#19
dkeven merged 1 commit into
feat/nvsharefrom
scheduler/refactor/rm_app_gpu

Conversation

@dkeven

@dkeven dkeven commented May 21, 2026

Copy link
Copy Markdown
Member

Overview

Companion to Olares#3112, which moves all per-app GPU allocation logic into app-service. This PR strips the now-unused app-level GPU management surface from HAMi and extends the GPUBinding schema with the fields the new owner (app-service) needs to disambiguate bindings across multi-tenant installs.

Net change: 6 files, +56 / −1229.

What changes

1. Drop the app-level GPU management HTTP surface

pkg/scheduler/routes/gpu_manage.go (867 lines) is deleted in full. The following endpoints no longer exist on the HAMi scheduler:

  • GET /gpus
  • PUT /gpus/assignments/bulk
  • POST /gpus/:id/mode
  • POST /gpus/:id/assign
  • POST /gpus/:id/unassign

Their replacements live in app-service under /compute-resources/* (see the Olares PR). HAMi keeps the standard scheduler-extender surface (/filter, /bind, /webhook, /healthz) and that's it.

cmd/scheduler/main.go is updated accordingly — the routes above are removed, and the CleanupGPUBindingsLoop goroutine is no longer started because app-service is now the sole writer of GPUBinding objects and is responsible for their lifecycle. The --cleanup-startup-delay flag stays (it still gates CleanupPodsWithMissingDevicesLoop).

2. GPUBinding CRD gains spec.owner and spec.namespace

spec:
  appName: <string>            # existing
  uuid:    <string>            # existing
  owner:     <string>          # NEW (optional)
  namespace: <string>          # NEW (optional)
  podSelector: <LabelSelector> # existing
  memory:      <Quantity>      # existing

Both charts/hami/crds/gpu.bytetrade.io_gpubindings.yaml and pkg/api/gpu/v1alpha1/gpubinding_types.go are updated. The fields are optional for backward compatibility, but app-service always populates them on newly-created bindings so the scheduler can tell apart two installs of the same app by different users.

3. Scheduler filter matches on the new identity tuple

pkg/scheduler/scheduler.go shrinks from ~400 lines of GPU-management logic to ~47 lines focused purely on the filter path. The scheduler still selects which GPUs to bind a pod to, but it no longer:

  • runs a periodic CleanupGPUBindingsLoop (app-service handles this in install/suspend/uninstall),
  • exposes selectDynamicGPUCandidates for dynamic candidate picking (app-service does the picking and writes the binding before the pod is scheduled).

Matching is now keyed off a new appBindingIdentity{appName, owner, namespace} tuple computed from pod labels (applications.app.bytetrade.io/name, applications.app.bytetrade.io/owner) and the pod's namespace, so a pod from user alice and a pod from user bob for the same comfyui app never collide on each other's bindings.

4. New AppOwnerLabelKey constant

pkg/util/types.go adds:

AppOwnerLabelKey = \"applications.app.bytetrade.io/owner\"

Mirrors the existing AppNameLabelKey; used by the scheduler when building the identity tuple above.

Compatibility notes

  • Existing GPUBinding objects without spec.owner / spec.namespace continue to match by appName alone. Olares ships a migration (in Olares#3112 follow-ups) that backfills these fields during upgrade.
  • Any external tool that called the removed /gpus* endpoints must move to the new /compute-resources/* surface on app-service. There are no in-cluster callers of the old endpoints in beclab.
  • Standard HAMi scheduler-extender contract (/filter, /bind, /webhook) is unchanged — Kubernetes scheduler integration is untouched.

Test plan

  • go build ./... and go vet ./...
  • Helm install on a single-node nvidia cluster with the new CRD; verify the scheduler pod comes up healthy and /healthz responds
  • Trigger an install from app-service (via Olares#3112) and verify the scheduler picks the same GPU UUID app-service wrote into the GPUBinding
  • Two users install the same app on a multi-GPU node; verify pods bind to their own GPUBinding rows and don't pick each other's UUIDs
  • Confirm no client in the repo still calls the removed /gpus* routes

Made with Cursor

@dkeven dkeven merged commit 2c54420 into feat/nvshare May 21, 2026
1 check passed
@dkeven dkeven deleted the scheduler/refactor/rm_app_gpu branch May 21, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant