Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion docs/sagemaker/source/dlcs/available.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,18 @@ In case you want to serve text generation models with vLLM, there are specific D

| vLLM version | Container URI | Accelerator |
| -------------- | -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| 0.17.0 | 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-vllm:0.17.0-transformers4.57.5-gpu-py312-cu129-ubuntu22.04 | GPU |
| 0.21.0 | 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-vllm:0.21.0-transformers5.8.1-gpu-py312-cu130-ubuntu22.04 | GPU |
| 0.11.0 | 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-vllm-inference-neuronx:0.11.0-optimum0.4.5-neuronx-py310-sdk2.26.1-ubuntu22.04 | Neuron |

### vLLM Omni

You can also use vLLM Omni for serving multimodal models with vLLM on GPUs.

| vLLM Omni version | Container URI | Accelerator |
| ---------------| -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| 0.20.0 | 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-vllm-omni:0.20.0-transformers5.8.1-gpu-py312-cu130-amzn2023 | GPU |


### SGLang

There is also a specific DLC for serving models with SGLang on GPU.
Expand All @@ -50,6 +59,16 @@ There is also a specific DLC for serving models with SGLang on GPU.
| 0.5.8 | 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-sglang:0.5.8-transformers4.57.3-gpu-py312-cu129-ubuntu24.04 | GPU |


### Llama.cpp

For a lightweight inference serving, there is a specific DLC for serving models with Llama.cpp on both CPU and GPU.

| Llama.cpp version | Container URI | Accelerator |
| ---------------| -------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| b9522 | 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-llama.cpp:b9522-gpu-cu130-ubuntu24.04 | GPU |
| b9522 | 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-llama.cpp:b9522-cpu-ubuntu24.04 | CPU |


### Text Embeddings Inference

Finally, there is the Text Embeddings Inference (TEI) DLC for high-performance serving of embedding models on CPU and GPU.
Expand Down
Loading