Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ jobs:
export MOONCAKE_STORE_LIB_DIR=$GITHUB_WORKSPACE/build/mooncake-store/src
export MOONCAKE_STORE_INCLUDE_DIR=$GITHUB_WORKSPACE/mooncake-store/include
# This job builds Mooncake with -DENABLE_ASAN=ON, so the C++ libraries
# the Rust crate links against carry undefined __asan_* references. Opt
# the Rust package links against carry undefined __asan_* references. Opt
# in to linking the ASan runtime; build.rs emits -lasan first, which
# keeps libasan first in the initial library list as ASan requires.
# Non-sanitized builds leave this unset and link without ASan.
Expand Down Expand Up @@ -179,7 +179,7 @@ jobs:
export LD_LIBRARY_PATH=$GITHUB_WORKSPACE/build/mooncake-common:$GITHUB_WORKSPACE/build/mooncake-store/src:$GITHUB_WORKSPACE/build/mooncake-transfer-engine/src:$GITHUB_WORKSPACE/build/mooncake-transfer-engine/src/common/base:$GITHUB_WORKSPACE/build/mooncake-common/etcd
export CGO_ENABLED=1
export CGO_CFLAGS="-I$GITHUB_WORKSPACE/mooncake-store/include -I$GITHUB_WORKSPACE/mooncake-transfer-engine/include"
export CGO_LDFLAGS="-L$GITHUB_WORKSPACE/build/mooncake-store/src -L$GITHUB_WORKSPACE/build/mooncake-store/src/cachelib_memory_allocator -L$GITHUB_WORKSPACE/build/mooncake-transfer-engine/src -L$GITHUB_WORKSPACE/build/mooncake-transfer-engine/src/common/base -L$GITHUB_WORKSPACE/build/mooncake-common -L$GITHUB_WORKSPACE/build/mooncake-common/etcd -lmooncake_store -lcachelib_memory_allocator -ltransfer_engine -lbase -lasio -letcd_wrapper -lstdc++ -lnuma -lglog -lgflags -libverbs -lmlx5 -ljsoncpp -lzstd -lcurl -luring -lasan -lm -lgcov -lxxhash"
export CGO_LDFLAGS="-L$GITHUB_WORKSPACE/build/mooncake-store/src -L$GITHUB_WORKSPACE/build/mooncake-store/src/cachelib_memory_allocator -L$GITHUB_WORKSPACE/build/mooncake-transfer-engine/src -L$GITHUB_WORKSPACE/build/mooncake-transfer-engine/src/common/base -L$GITHUB_WORKSPACE/build/mooncake-common -L$GITHUB_WORKSPACE/build/mooncake-common/etcd -lmooncake_store -lcachelib_memory_allocator -ltransfer_engine -lbase -lasio -letcd_wrapper -lstdc++ -lnuma -lglog -lgflags -libverbs -lmlx5 -ljsoncpp -lzstd -lcurl -luring -lasan -lm -lgcov -lxxhash -lyaml-cpp"
# Link cudart if CUDA is available (needed for D2H staging in mooncake_store)
if [ -d /usr/local/cuda/lib64 ]; then export CGO_LDFLAGS="$CGO_LDFLAGS -L/usr/local/cuda/lib64 -lcudart"; fi
ASAN_OPTIONS=detect_leaks=0:verify_asan_link_order=0 MC_METADATA_SERVER=http://127.0.0.1:8080/metadata go test -v ./tests/...
Expand Down
39 changes: 23 additions & 16 deletions docs/source/deployment/mooncake-store-deployment-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,16 +268,29 @@ When tenant quota is enabled, `/metrics` also includes per-tenant quota gauges a

## Tenant Quota Management

Tenant quota admission is disabled by default. Enable it on the master when you want memory writes admitted against per-tenant quota:
Tenant quota admission is disabled by default. Enable strict multi-tenant mode on the master when you want memory writes admitted against connector-managed per-tenant quota:

```bash
mooncake_master \
--enable_tenant_quota=true \
--default_tenant_quota_bytes=1073741824 \
--tenant_quota_pool_capacity_bytes=0
--enable_multi_tenants=true \
--tenant_quota_connector_type=file \
--tenant_quota_connector_uri=/etc/mooncake/tenant_quotas.yaml
```

`tenant_quota_pool_capacity_bytes=0` uses the full registered memory capacity as the quota allocation pool. A nonzero value caps the capacity used to compute effective tenant quotas.
The v1 connector is a writable YAML file. The file must use schema version `1`; tenant names must be non-empty, unique, must not start with `_`, and must not contain NUL or control characters; quotas must be positive integers with optional `B`, `KB`, `MB`, `GB`, or `TB` units:

```yaml
version: 1

tenants:
- name: tenant-a
quota: 200GB

- name: tenant-b
quota: 500GB
```
When strict multi-tenant mode is enabled, write requests must include a registered tenant. The `default` tenant is not special unless it is explicitly registered in the connector policy.

The same HTTP port used for metrics exposes the tenant quota admin API:

Expand All @@ -293,14 +306,8 @@ curl -s -X PUT "http://<master_host>:9003/api/v1/tenant_quotas?tenant_id=tenant-
-H 'Content-Type: application/json' \
-d '{"requested_quota_bytes":2147483648}'
# Delete an explicit policy so the tenant inherits the default policy again.
# Delete an explicit policy. The tenant must not own objects or quota usage.
curl -s -X DELETE "http://<master_host>:9003/api/v1/tenant_quotas?tenant_id=tenant-a"

# Query or update the default requested quota. The default may be 0.
curl -s http://<master_host>:9003/api/v1/tenant_quotas/default
curl -s -X PUT http://<master_host>:9003/api/v1/tenant_quotas/default \
-H 'Content-Type: application/json' \
-d '{"requested_quota_bytes":1073741824}'
```

Each tenant quota snapshot returns:
Expand All @@ -321,7 +328,7 @@ Each tenant quota snapshot returns:
}
```

In HA mode, quota admin requests are served only by the active master service. Standby, candidate, or inactive services return HTTP 503. If tenant quota is disabled, the quota admin API returns HTTP 409 with `UNAVAILABLE_IN_CURRENT_MODE`.
In HA mode, quota admin requests are served only by the active master service. Standby, candidate, or inactive services return HTTP 503. If strict multi-tenant mode is disabled, the quota admin API returns HTTP 409 with `UNAVAILABLE_IN_CURRENT_MODE`. Deleting a non-empty tenant returns HTTP 409 with `TENANT_NOT_EMPTY`.

---

Expand Down Expand Up @@ -465,9 +472,9 @@ mooncake_master \

| Flag | Default | Description |
|------|---------|-------------|
| `--enable_tenant_quota` | `false` | Enable per-tenant memory quota admission |
| `--default_tenant_quota_bytes` | `0` | Default requested quota for tenants without explicit policy; `0` is allowed and inherited-default tenants still share capacity left by explicit tenants |
| `--tenant_quota_pool_capacity_bytes` | `0` | Capacity used to compute effective tenant quotas; `0` means total registered memory capacity |
| `--enable_multi_tenants` | `false` | Enable strict tenant registration and per-tenant memory quota admission |
| `--tenant_quota_connector_type` | `file` | Tenant quota policy connector type |
| `--tenant_quota_connector_uri` | empty | Connector URI; for `file`, the writable YAML policy path |

### High Availability

Expand Down
34 changes: 22 additions & 12 deletions docs/source/design/mooncake-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,33 +93,43 @@ To reduce cache warm-up time after a master restart, the Master Service supports

### Tenant Quota

The Master Service can optionally enforce memory quota admission per tenant. This feature is disabled by default. When `enable_tenant_quota=false`, existing allocation and eviction behavior is preserved, and tenant quota management requests return `UNAVAILABLE_IN_CURRENT_MODE`.
The Master Service can optionally enforce strict multi-tenant memory quota admission. This feature is disabled by default. When `enable_multi_tenants=false`, request tenant IDs are ignored for object placement, all objects use the `default` namespace, and tenant quota management requests return `UNAVAILABLE_IN_CURRENT_MODE`.

When tenant quota is enabled, each tenant has a requested quota policy and an effective quota. Explicit tenant policies are set through the master admin HTTP API. Tenants without an explicit policy inherit the default requested quota. The default policy may be `0`; explicit tenant policies must be positive.
When strict multi-tenant mode is enabled, the tenant quota policy is loaded from the configured connector. The v1 connector is a writable YAML file configured by `tenant_quota_connector_type=file` and `tenant_quota_connector_uri=<path>`. Tenants must be explicitly present in that connector policy before they can write. Missing tenants, empty tenants, and an unregistered `default` tenant are rejected with `TENANT_NOT_REGISTERED`.

Effective quota is recomputed from the current registered memory capacity and the optional tenant quota pool cap:
The YAML policy uses schema version `1`:

- The allocatable capacity is the smaller of total registered memory and `tenant_quota_pool_capacity_bytes`; a pool cap of `0` means total registered memory.
- If explicit tenant requests fit within the allocatable capacity, explicit tenants receive their requested quotas and the remaining capacity is split evenly among active inherited-default tenants.
- If explicit tenant requests exceed the allocatable capacity, only explicit tenants receive quota, scaled proportionally by request size. Inherited-default tenants receive `0` effective quota until capacity is available.
```yaml
version: 1

tenants:
- name: tenant-a
quota: 200GB
```

Tenant names must be non-empty, unique, must not start with `_`, and must not contain NUL or control characters. Quotas must be positive integers and may use `B`, `KB`, `MB`, `GB`, or `TB` units.

Effective quota is recomputed from the current registered memory capacity:

- If explicit tenant requests fit within the registered memory capacity, tenants receive their requested quotas and remaining capacity stays unallocated.
- If explicit tenant requests exceed registered memory capacity, explicit tenants receive quota scaled proportionally by request size.
- Remainders are assigned deterministically by tenant ID, so repeated recomputes produce stable results.
- Tenants present in restored metadata but missing from the connector policy become in-memory orphans with requested quota `0`, effective quota `0`, and `over_quota=true` while they still own metadata. Reads and removals are allowed so operators can clean them up; writes remain blocked until the tenant is re-registered or emptied.

`PutStart` and size-changing `UpsertStart` charge quota before memory is allocated. If the first reservation fails, the master performs tenant-scoped memory eviction for the target tenant and retries the reservation. The retry is bounded to two eviction attempts. Tenant quota eviction scans only the target tenant, skips hard-pinned objects, honors soft-pin eviction configuration, and preserves grouped-object lease safety checks.

The admin HTTP API exposes:
Admin policy changes are persisted before the final in-memory policy is applied. `PUT` writes the connector first and then applies the policy in memory. `DELETE` first marks the tenant unregistered in memory to block concurrent writes, verifies the tenant is empty, writes the connector, and rolls back the in-memory mark if the connector write fails. The admin HTTP API exposes:

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/v1/tenant_quotas` | List quota snapshots for active or explicit tenants |
| `GET` | `/api/v1/tenant_quotas?tenant_id=<tenant>` | Query one tenant quota snapshot |
| `PUT` | `/api/v1/tenant_quotas?tenant_id=<tenant>` | Upsert an explicit tenant quota policy |
| `DELETE` | `/api/v1/tenant_quotas?tenant_id=<tenant>` | Delete an explicit tenant quota policy |
| `GET` | `/api/v1/tenant_quotas/default` | Query the default requested quota policy |
| `PUT` | `/api/v1/tenant_quotas/default` | Set the default requested quota policy |
| `PUT` | `/api/v1/tenant_quotas?tenant_id=<tenant>` | Create or update a tenant quota policy |
| `DELETE` | `/api/v1/tenant_quotas?tenant_id=<tenant>` | Delete an empty tenant quota policy |

Tenant quota snapshots include `tenant_id`, `requested_quota_bytes`, `effective_quota_bytes`, `used_bytes`, `reserved_bytes`, `committed_count`, `over_quota`, and `has_explicit_policy`.

When snapshots are enabled, tenant quota policies are written to a separate `tenant_quota_policy` snapshot object. This file stores only requested policy: the default requested quota and explicit tenant requested quotas. Effective quota, usage, reservations, and committed object charge are rebuilt from restored metadata and current capacity. Older snapshots without `tenant_quota_policy` remain valid and use the configured default quota on restore.
Snapshots restore object runtime state only. Tenant quota policy is always loaded from the connector after metadata restore, then usage and effective quota are rebuilt from restored metadata and current registered capacity. If the connector cannot be loaded in strict multi-tenant mode, startup fails.

### Master Service APIs

Expand Down
4 changes: 4 additions & 0 deletions mooncake-store/conf/master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ allow_evict_soft_pinned_objects: true
eviction_ratio: 0.1
eviction_high_watermark_ratio: 1.0

enable_multi_tenants: false
tenant_quota_connector_type: "file"
tenant_quota_connector_uri: ""

enable_ha: false
etcd_endpoints: "http://localhost:2379"
root_fs_dir: ""
Expand Down
2 changes: 1 addition & 1 deletion mooncake-store/go/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ CGO_LDFLAGS+=" -L${BUILD_DIR}/mooncake-transfer-engine/src"
CGO_LDFLAGS+=" -L${BUILD_DIR}/mooncake-transfer-engine/src/common/base"
CGO_LDFLAGS+=" -L${BUILD_DIR}/mooncake-common"
CGO_LDFLAGS+=" -L${BUILD_DIR}/mooncake-common/src"
CGO_LDFLAGS+=" -lmooncake_store -lcachelib_memory_allocator -ltransfer_engine -lbase -lasio -lmooncake_common -lxxhash"
CGO_LDFLAGS+=" -lmooncake_store -lcachelib_memory_allocator -ltransfer_engine -lbase -lasio -lmooncake_common -lxxhash -lyaml-cpp"
CGO_LDFLAGS+=" -lstdc++ -lnuma -lglog -lgflags -libverbs -lmlx5 -ljsoncpp -lzstd -lcurl -lm"

if [ -d "/usr/local/cuda/lib64" ]; then
Expand Down
4 changes: 0 additions & 4 deletions mooncake-store/include/master_admin_service.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,6 @@ class MasterAdminServer {
coro_http::coro_http_response& resp);
void HandleDeleteTenantQuota(coro_http::coro_http_request& req,
coro_http::coro_http_response& resp);
void HandleGetDefaultTenantQuota(coro_http::coro_http_request& req,
coro_http::coro_http_response& resp);
void HandleSetDefaultTenantQuota(coro_http::coro_http_request& req,
coro_http::coro_http_response& resp);

void RegisterHandler();

Expand Down
Loading
Loading