Skip to content

sgl-project/rbg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

428 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RoleBasedGroup (RBG) 🚀

English | 简体中文

License GitHub release Go Report Card

🎯 A Kubernetes API for orchestrating distributed, stateful AI inference workloads with multi-role collaboration and built-in service discovery.

🌐 Official Website: rolebasedgroup.github.io


🏗️ Architecture

RBG Architecture


📰 Latest News

Date Release Highlights
2026-06-11 v0.7.0 v1alpha2 API stable release, conversion webhooks, CLI multi-node LLM serving, pod port allocator, coordinated policies, gang scheduling
2026-02-18 v0.6.0 Coordinated scaling, stateful InstanceSet
2025-12-03 v0.5.0 Native InstanceSet, in-place updates, Mooncake integration
2025-09-23 v0.4.0 RBGS scaling, Volcano podgroup support

🤔 Why RBG?

Traditional Kubernetes primitives (StatefulSets / Deployments) struggle with LLM inference services that:

Challenge Description
Multi-role topologies gateway → router → prefill → decode
Performance-sensitive GPU/network topology matters
Atomic operations deploy, upgrade, scale, failover across roles

RBG treats an inference service as a role-based group — a topologized, stateful, coordinated multi-role organism managed as a single unit.


🎯 Key Concepts

Concept Description
Role Basic scheduling and rollout unit. Each role (prefill, decode) has its own spec, lifecycle and policies.
RoleBasedGroup A group of roles forming one logical service (e.g., one LLM inference deployment).
RoleInstance A collection of Pods with tightly bound lifecycle. Supports in-place updates and controls upgrades/status for the Pod group.
CoordinatedPolicy A separate CRD for coordinating operations across roles. Controls maxSkew and progression during rolling updates and scaling.

✨ Key Features — SCOPE

Capability Description
Stable Topology-aware deterministic operations with unique RoleID injection
Coordination Cross-role policy engine: deployment pairing, coordinated upgrades, linked recovery
Orchestration Role dependencies, precise startup sequences, topology self-aware service discovery
Performance Hardware affinity scheduling: GPU-NVLink → PCIe → RDMA → VPC
Extensible Declarative APIs and plugin mechanisms for future architectures

🚀 Getting Started

📦 Installation

Install from GitHub Releases (latest version):

VERSION=$(curl -sL https://api.github.com/repos/sgl-project/rbg/releases/latest | grep '"tag_name"' | sed -E 's/.*"v([^"]+)".*/\1/')
helm upgrade --install rbgs https://github.com/sgl-project/rbg/releases/download/v$VERSION/rbgs-$VERSION.tgz \
            --namespace rbgs-system --create-namespace --wait

For detailed instructions, see Installation Guide.

🎮 Quick Start

Deploy a basic RoleBasedGroup with two roles and startup dependencies:

apiVersion: workloads.x-k8s.io/v1alpha2
kind: RoleBasedGroup
metadata:
  name: nginx-cluster
spec:
  roles:
    - name: frontend
      replicas: 1
      standalonePattern:
        template:
          spec:
            containers:
              - name: nginx
                image: nginx:1.14.1
                ports:
                  - containerPort: 80

    - name: backend
      replicas: 3
      dependencies: ["frontend"]  # backend starts after frontend is ready
      standalonePattern:
        template:
          spec:
            containers:
              - name: nginx
                image: nginx:1.14.1
                ports:
                  - containerPort: 8080

Deployment Patterns

Pattern Used For Description
standalonePattern Single-node deployment Single pod per instance
leaderWorkerPattern Multi-node distributed deployment Leader + workers for tensor parallelism

RoleTemplates

Reduce configuration duplication with reusable templates:

spec:
  roleTemplates:
    - name: base-template
      template:
        spec:
          containers:
            - name: nginx
              image: nginx:1.14.1

  roles:
    - name: frontend
      replicas: 2
      standalonePattern:
        templateRef:
          name: base-template

    - name: backend
      replicas: 3
      standalonePattern:
        templateRef:
          name: base-template
          patch:  # role-specific overrides
            spec:
              containers:
                - name: nginx
                  resources:
                    requests:
                      memory: "128Mi"

🖥️ CLI Tool

kubectl-rbg is a CLI tool for managing RBG resources and LLM deployments.

Installation

# Build from source
make build-cli
chmod +x bin/kubectl-rbg
sudo mv bin/kubectl-rbg /usr/local/bin/

🧠 Inference Examples

Prefill/Decode Disaggregated

SGLang PD-disaggregated examples in examples/inference/:

Example Pattern Description
pd-disagg-standalone.yaml standalonePattern Single pod per role, suitable for single-GPU instances
pd-disagg-leader-worker.yaml leaderWorkerPattern Multi-GPU tensor parallelism for decode role

Aggregated Inference

SGLang aggregated examples:

Example Pattern Description
agg-standalone.yaml standalonePattern Single-GPU aggregated inference
agg-leader-worker.yaml leaderWorkerPattern Multi-GPU tensor parallelism

🔗 Ecosystem Integration

RBG integrates with ecosystem components for production LLM inference:

NVIDIA Dynamo

NVIDIA Dynamo is an open-source, datacenter-scale inference stack that orchestrates multi-node AI workloads above inference engines like vLLM and SGLang:

Example Description
dynamo/pd-disagg.yaml PD-disaggregated with Dynamo SGLang runtime
dynamo/pd-disagg-multi-nodes.yaml Multi-node PD-disaggregated
dynamo/agg.yaml Aggregated inference with Dynamo
dynamo/agg-multi-nodes.yaml Multi-node aggregated

Mooncake

Mooncake is a disaggregated architecture for LLM serving, providing KV cache transfer and reuse across distributed inference:

Example Description
mooncake-store/pd-disagg-kvcache-reuse-with-mooncake.yaml PD-disaggregated with KV cache reuse
mooncake-store/agg-kvcache-reuse-with-mooncake.yaml Aggregated with KV cache reuse
mooncake-transfer-engine/sgl-pd-disagg-with-mooncake-te.yaml SGLang PD-disaggregated with transfer engine
mooncake-transfer-engine/vllm-pd-disagg-with-mooncake-te.yaml vLLM PD-disaggregated with transfer engine

📂 Examples Directory

🧱 Basic Examples (examples/basic/)

Path Description
rbg/base.yaml Basic RoleBasedGroup with role dependencies
rbg/dependency/ Role dependency configurations
rbg/patterns/ Deployment patterns: standalone, leader-worker, custom-components
rbg/scheduling/ Gang scheduling: Volcano, scheduler-plugins
rbg/update-strategy/ Rolling update with partition support
rbg/restart-policy/ Restart policy configurations
rbg/scaling/ Scaling adapter with HPA integration
rbg/role-template/ RoleTemplates for reducing duplication
coordinated-policy/ Coordinated rollout and scaling policies
engine-runtime/ Engine runtime profile configurations

🧠 Inference Examples (examples/inference/)

Path Description
agg-standalone.yaml Aggregated SGLang (standalone pattern)
agg-leader-worker.yaml Aggregated (leader-worker pattern)
pd-disagg-standalone.yaml Prefill/Decode disaggregated (standalone)
pd-disagg-leader-worker.yaml Prefill/Decode disaggregated (leader-worker)
ecosystem/ NATS, etcd, Dynamo, Mooncake integration
ecosystem/dynamo/ NVIDIA Dynamo runtime examples
ecosystem/mooncake/ Mooncake KV cache transfer engine

📚 Documentation

Source Link
Official Docs rolebasedgroup.github.io
Local Docs doc/TOC.md

Version Compatibility

RBG Version Kubernetes LeaderWorkerSet
v0.7.0 >=v1.22.x Not Required
v0.6.0 >=v1.28.x >=v0.7.0
v0.5.0 >=v1.28.x >=v0.6.0
v0.4.0 >=v1.28.x >=v0.7.0

🌐 Ecosystem Projects

The rolebasedgroup GitHub organization hosts companion projects that extend RBG with autoscaling, CLI tooling, AI agent integration, and documentation:

Project Description
rbg-planner Engine-agnostic, SLA-driven autoscaler for LLM inference on Kubernetes. Supports SGLang, vLLM, NVIDIA Dynamo via pluggable metrics adapters. Uses ARIMA-based load prediction and automatic SLA profiling to scale prefill/decode roles to meet TTFT/ITL latency targets.
inference-engine-runtime Python-based sidecar runtime for AI inference engines. Provides LoRA adapter management, unified Prometheus metrics, and distributed topology management for SGLang and vLLM engines.
inference-ext-cli RBG CLI extension (llmctl) for LLM inference workload management. Provides service/model management, benchmark orchestration, automated parameter search (Optuna), convergence analysis, and web dashboards for experiment visualization.
rbg-agent-guide AI agent skill guides for RBG operations. Provides deployment skills for AI coding assistants (e.g., Claude Code) to help users deploy LLM models to Kubernetes using RBG CRD and CLI.
rolebasedgroup.github.io Official RBG documentation website built with Docusaurus, deployed at rolebasedgroup.github.io.

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Verify copyright headers
make copyright-check

# Add missing headers
make copyright-fix

💬 Community

Channel Link
Slack #rbg channel
Issues GitHub Issues
Discussions Community Discussions

📜 Code of Conduct

This project follows the Kubernetes Code of Conduct.


🙏 Acknowledgment

RBG is inspired by and reuses code from LeaderWorkerSet (LWS).

About

A workload for deploying LLM inference services on Kubernetes

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages