Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/developers/hami-core-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: HAMi-core design

HAMi-core is a hook library designed for the CUDA environment.
It functions as an in-container GPU resource controller and has been adopted by projects
like [HAMi](https://github.com/HAMi-project/HAMi) and [Volcano](https://github.com/volcano-sh/devices).
like [HAMi](https://github.com/Project-HAMi/HAMi) and [Volcano](https://github.com/volcano-sh/devices).

![HAMi-core architecture diagram showing GPU resource controller design](/img/docs/common/developers/hami-core-design/hami-arch.png)

Expand Down
2 changes: 1 addition & 1 deletion docs/developers/protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ hami.io/node-nvidia-register: GPU-00552014-5c87-89ac-b1a6-7b53aa24b0ec,10,32768,

```

In this example, this node has two different AI devices, 2 Nvidia-V100 GPUs, and 2 Cambircon 370-X4 MLUs
In this example, this node has two different AI devices, 2 Nvidia-V100 GPUs, and 2 Cambricon 370-X4 MLUs

A device node may become unavailable due to hardware or network failure. If a node hasn't registered in the last 5 minutes, the scheduler marks it as 'unavailable'.

Expand Down
34 changes: 17 additions & 17 deletions docs/developers/scheduling.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Scheduler Policy

## Summary

Current in a cluster with many GPU nodes, nodes are not `binpack` or `spread` when making scheduling decisions, nor are GPU cards `binpack` or `spread` when using vGPU.
Currently, in a cluster with many GPU nodes, nodes are not `binpack` or `spread` when making scheduling decisions, nor are GPU cards `binpack` or `spread` when using vGPU.

## Proposal

Expand All @@ -26,12 +26,12 @@ node binpack, use one node’s GPU card whenever possible, e.g.:
- node2: GPU having 4 GPU device

- request:
- pod1: User 1 GPU
- pod2: User 1 GPU
- pod1: Use 1 GPU
- pod2: Use 1 GPU

- scheduler result:
- pod1: scheduler to node1
- pod2: scheduler to node1
- pod1: scheduled to node1
- pod2: scheduled to node1

#### Story 2

Expand All @@ -42,12 +42,12 @@ node spread, use GPU cards from different nodes as much as possible, e.g.:
- node2: GPU having 4 GPU device

- request:
- pod1: User 1 GPU
- pod2: User 1 GPU
- pod1: Use 1 GPU
- pod2: Use 1 GPU

- scheduler result:
- pod1: scheduler to node1
- pod2: scheduler to node2
- pod1: scheduled to node1
- pod2: scheduled to node2

#### Story 3

Expand All @@ -57,12 +57,12 @@ GPU binpack, use the same GPU card as much as possible, e.g.:
- node1: GPU having 4 GPU device, they are GPU1,GPU2,GPU3,GPU4

- request:
- pod1: User 1 GPU, gpucore is 20%, gpumem-percentage is 20%
- pod2: User 1 GPU, gpucore is 20%, gpumem-percentage is 20%
- pod1: Use 1 GPU, gpucore is 20%, gpumem-percentage is 20%
- pod2: Use 1 GPU, gpucore is 20%, gpumem-percentage is 20%

- scheduler result:
- pod1: scheduler to node1, select GPU1 this device
- pod2: scheduler to node1, select GPU1 this device
- pod1: scheduled to node1, select GPU1
- pod2: scheduled to node1, select GPU1

#### Story 4

Expand All @@ -72,12 +72,12 @@ GPU spread, use different GPU cards when possible, e.g.:
- node1: GPU having 4 GPU device, they are GPU1,GPU2,GPU3,GPU4

- request:
- pod1: User 1 GPU, gpucore is 20%, gpumem-percentage is 20%
- pod2: User 1 GPU, gpucore is 20%, gpumem-percentage is 20%
- pod1: Use 1 GPU, gpucore is 20%, gpumem-percentage is 20%
- pod2: Use 1 GPU, gpucore is 20%, gpumem-percentage is 20%

- scheduler result:
- pod1: scheduler to node1, select GPU1 this device
- pod2: scheduler to node1, select GPU2 this device
- pod1: scheduled to node1, select GPU1
- pod2: scheduled to node1, select GPU2

## Design Details

Expand Down
2 changes: 1 addition & 1 deletion docs/installation/how-to-use-volcano-vgpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ status:

### Running vGPU Jobs

vGPU can be requested by both set "volcano.sh/vgpu-number", "volcano.sh/vgpu-cores" and "volcano.sh/vgpu-memory" in resources.limits.
vGPU can be requested by setting `volcano.sh/vgpu-number`, `volcano.sh/vgpu-cores` and `volcano.sh/vgpu-memory` in `resources.limits`.

```shell
cat <<EOF | kubectl apply -f -
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Allocate AWS Neuron core
title: Allocate AWS Neuron device
---

To allocate one or more AWS Neuron devices exclusively, use `aws.amazon.com/neuron`
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Allocate exclusive BI-V100 device
title: Allocate exclusive MR-V100 device
---

To allocate multiple BI-V100 devices, you only need to assign `iluvatar.ai/BI-V150-vgpu` with no other fields required.
To allocate multiple MR-V100 devices, you only need to assign `iluvatar.ai/MR-V100-vgpu` with no other fields required.

```yaml
apiVersion: v1
Expand Down
2 changes: 1 addition & 1 deletion docs/userguide/kunlunxin-device/enable-kunlunxin-vxpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ spec:
resources:
limits:
kunlunxin.com/vxpu: 1 # requesting a VXPU
kunlunxin.com/vxpu-memory: 24576 # requesting a virtual XPU that requires 24576 MiB of device memorymemory
kunlunxin.com/vxpu-memory: 24576 # requesting a virtual XPU that requires 24576 MiB of device memory
```

## Device UUID Selection
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Allocate device core and memory resource
---

To allocate a certain part of device core resource, you need only to assign the `mthreads.com/sgpu-memory` and `mthreads.com/sgpu-core` along with the number of Cambricon MLUs you requested in the container using `mthreads.com/vgpu`.
To allocate a certain part of device core resource, you need only to assign the `mthreads.com/sgpu-memory` and `mthreads.com/sgpu-core` along with the number of Mthreads GPUs you requested in the container using `mthreads.com/vgpu`.

```yaml
apiVersion: v1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Allocate exclusive device
---

To allocate a whole cambricon device, you need to only assign `mthreads.com/vgpu` without other fields. You can allocate multiple GPUs for a container.
To allocate a whole Mthreads device, you need to only assign `mthreads.com/vgpu` without other fields. You can allocate multiple GPUs for a container.

```yaml
apiVersion: v1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ linktitle: Allocate device core usage
---

Allocate a part of device core resources by specifying resource `mthreads.com/sgpu-core`.
Optional, each unit of `mthreads.com/smlu-core` equals 1/16 of device cores.
Optional, each unit of `mthreads.com/sgpu-core` equals 1/16 of device cores.

```yaml
resources:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Default vGPU Job
---

vGPU can be requested by both set "volcano.sh/vgpu-number", "volcano.sh/vgpu-cores" and "volcano.sh/vgpu-memory" in resources.limits
vGPU can be requested by setting `volcano.sh/vgpu-number`, `volcano.sh/vgpu-cores` and `volcano.sh/vgpu-memory` in `resources.limits`

```yaml
apiVersion: v1
Expand Down
Loading