Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions _includes/feature-notes/runtime-reindex.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:::caution Preview — added in `v1.38`

This is a preview feature. The REST shape and behavior may change before GA. Do not rely on backup/restore while a reindex is in flight or recently completed on a v1.38 Preview cluster — wait for all tasks to reach `ready` / `failed` / `cancelled` first.

:::
2 changes: 1 addition & 1 deletion docs/weaviate/concepts/filtering.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ The `indexRangeFilters` index is a range-based index for filtering by numerical

Internally, rangeable indexes are implemented as roaring bitmap slices. This data structure limits the index to values that can be stored as 64 bit integers.

`indexRangeFilters` is only available for new properties. Existing properties cannot be converted to use the rangeable index.
Before `v1.38`, `indexRangeFilters` was only available for new properties — existing properties could not be converted to use the rangeable index. From `v1.38`, you can add a rangeable index to an existing property on a populated collection without restart using the [runtime reindex](../manage-collections/inverted-index.mdx#reindex-a-property-on-a-collection-v138) endpoints.

## Recall on Pre-Filtered Searches

Expand Down
10 changes: 9 additions & 1 deletion docs/weaviate/concepts/indexing/inverted-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ import BlockmaxWand from '/_includes/feature-notes/blockmax-wand.mdx';

The BlockMax WAND algorithm is a variant of the WAND algorithm that is used to speed up BM25 and hybrid searches. It organizes the inverted index in blocks to enable skipping over blocks that are not relevant to the query. This can significantly reduce the number of documents that need to be scored, improving search performance.

If you are experiencing slow BM25 (or hybrid) searches and use a Weaviate version prior to `v1.30`, try migrating to a newer version that uses the BlockMax WAND algorithm to see if it improves performance. If you need to migrate existing data from a previous version of Weaviate, follow the [v1.30 migration guide](/deploy/migration/weaviate-1-30.md).
If you are experiencing slow BM25 (or hybrid) searches and use a Weaviate version prior to `v1.30`, try migrating to a newer version that uses the BlockMax WAND algorithm to see if it improves performance. If you need to migrate existing data from a previous version of Weaviate, follow the [v1.30 migration guide](/deploy/migration/weaviate-1-30.md) — or on `v1.38+`, use the live [Reindex a property](/weaviate/manage-collections/inverted-index.mdx#migrate-bm25-from-wand-to-blockmax) endpoint to migrate without restart.

:::note Scoring changes with BlockMax WAND

Expand Down Expand Up @@ -168,6 +168,14 @@ An example of a complete collection object without inverted indexes:

</details>

### Changing an index after import

Because an inverted index is built at import time, a property created without one (or with the "wrong" tokenization or BM25 algorithm) historically required exporting the data, recreating the collection, and re-importing — an expensive, downtime-prone operation.

From `v1.38`, Weaviate can **reindex a property on a collection** instead. A reindex builds the new bucket in the background from object storage while the existing index keeps serving reads, then atomically flips a single schema flag once every replica has finished. The schema flag is the source of truth: if the rebuild fails, the flag is never flipped and the property stays in its pre-migration state, and an interrupted reindex is picked up automatically after a node restart. This makes adding a missing index, changing tokenization, or migrating BM25 from WAND to BlockMax a non-destructive, restart-safe operation.

For the operational steps, see [How-to: Reindex a property on a collection](/weaviate/manage-collections/inverted-index.mdx#reindex-a-property-on-a-collection-v138); for the endpoint reference, see [References: Runtime reindex](/weaviate/config-refs/indexing/inverted-index.mdx#runtime-reindex-v138-preview).

## Tokenization

Tokenization is the process of breaking text into smaller units called tokens. This process is fundamental to how inverted indexes work - the tokens produced determine what can be searched and how matching occurs.
Expand Down
96 changes: 93 additions & 3 deletions docs/weaviate/config-refs/indexing/inverted-index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -257,8 +257,99 @@ You can drop (delete) an inverted index from a property. This is a destructive o

The following index types can be dropped: `searchable`, `filterable`, `rangeFilters`.

REST: `DELETE /v1/schema/{className}/properties/{propertyName}/index/{indexName}`. From `v1.38`, the drop is gated by the [runtime-reindex](#runtime-reindex-v138-preview) MutationGuard — rejected while a reindex is in flight on the same property.

See [How-to: Drop an inverted index](../../manage-collections/inverted-index.mdx#drop-an-inverted-index) for code examples.

## Runtime reindex (v1.38 Preview)

import RuntimeReindexPreview from "/_includes/feature-notes/runtime-reindex.mdx";

<RuntimeReindexPreview/>

From `v1.38`, three REST endpoints let you alter a property's inverted-index configuration on a collection without restart. For task-oriented walkthroughs with `curl` examples, see [How-to: Reindex a property on a collection](../../manage-collections/inverted-index.mdx#reindex-a-property-on-a-collection-v138). This section is the endpoint and behavior reference.

### Endpoints

| Method | Path | Purpose |
|---|---|---|
| `PUT` | `/v1/schema/{class}/indexes/{property}` | Add an inverted index, change tokenization, migrate BM25 WAND → BlockMax, rebuild a bucket, or cancel an in-flight task. The body shape selects the migration type. |
| `DELETE` | `/v1/schema/{class}/properties/{property}/index/{indexName}` | Drop a configured index. `indexName` ∈ `{filterable, searchable, rangeFilters}`. |
| `GET` | `/v1/schema/{class}/indexes` | Read per-property index status. |

### Request body shapes (`PUT`)

| Body | Effect |
|---|---|
| `{"filterable":{"enabled":true}}` | Creates a `RoaringSet` bucket and flips `IndexFilterable=true`. |
| `{"searchable":{"enabled":true,"tokenization":"word"}}` | Creates a BlockMax searchable bucket, sets `Tokenization`, flips `IndexSearchable=true`. Requires `text` / `text[]`. |
| `{"rangeable":{"enabled":true}}` | Creates a `RoaringSetRange` bucket and flips `IndexRangeFilters=true`. Numeric types only (`int`, `number`, `date`). |
| `{"searchable":{"tokenization":"trigram"}}` | Retokenizes a populated `text` property. If a filterable bucket also exists, both are retokenized together. |
| `{"searchable":{"algorithm":"blockmax"}}` | Migrates every searchable property on the class from WAND to BlockMax. One-way; flipping back to `wand` is rejected. |
| `{"<bucket>":{"rebuild":true}}` | Rebuilds the named bucket (`filterable` / `searchable` / `rangeable`) from object storage. `searchable.rebuild` requires the property to already be on BlockMax. |
| `{"<bucket>":{"cancel":true}}` | Cancels an in-flight task on the property. Idempotent — returns `NO_OP` when nothing matches. |

A successful submit returns `202 Accepted` with `{"status":"STARTED","taskId":"<id>"}`.

### Status values (`GET`)

`GET /v1/schema/{class}/indexes` returns per-property index status. Each index reports one of:

| `status` | Meaning |
|---|---|
| `ready` | Index is live and serving. No migration in flight. |
| `pending` | A task has been accepted; per-shard work has not started yet. |
| `indexing` | Per-shard work is running. `progress` is a 0..1 estimate. |
| `failed` | At least one node reported `Success=false`. The schema flag was **not** flipped — the property is still in its pre-migration state. Submit a new task to retry. |
| `cancelled` | Operator cancelled. Partial state has been scrubbed. |

Between "task FINISHED" and "schema flag flipped", the response keeps the index at `indexing@100%` for a 3–10-second finalize window so the status never blinks back to a pre-migration shape before the schema catches up.

### Concurrency

- **Per `(class, property)` exclusivity** — only one migration is allowed in flight on a given `(collection, property)` pair. A second submit on the same pair returns `409` with the offending task ID.
- **Per-class cap** — up to 32 concurrent migrations per class. The next submit returns `429 Too Many Requests` until the in-flight units drain.
- **Different properties on the same class run in parallel.** Different classes are fully independent.

While a reindex is in flight on `(class, property)`, the schema FSM rejects `UpdateProperty` on the same property, `DeleteClass` on the affected class, `DeleteTenants` / `UpdateTenants` that make targeted shards locally unavailable (HOT → COLD / FROZEN / OFFLOADED), and `DELETE …/index/{indexName}` on the same property. The reject message names the in-flight task. A reindex on a **different** property of the same class is not blocked.

### Multi-tenancy

Scope a task to specific tenants on a multi-tenant class with the `?tenants=` query parameter (comma-separated). The rules:

| Class | `?tenants=` | Migration type | Result |
|---|---|---|---|
| Single-tenant | provided | any | `400` |
| Multi-tenant | omitted | format-only (`rebuild`, `repair`, `enable-rangeable`) | targets all tenants |
| Multi-tenant | omitted | semantic (`change-tokenization*`, `enable-filterable`, `enable-searchable`, `change-algorithm` (BM25 WAND → BlockMax)) | targets all tenants (the schema flip is cluster-wide) |
| Multi-tenant | provided | format-only | targets the named subset |
| Multi-tenant | provided | semantic | `400` — semantic migrations cannot be sub-scoped |
Comment on lines +323 to +326
Comment on lines +323 to +326
| any | tenant in `OFFLOADED` / `FROZEN` | any | `400` — the offending tenant is named in the error |

Each tenant's replicas form an independent barrier group: tenant A starts serving the new bucket as soon as its own replicas finish, even if tenant B is still reindexing.

### Errors and recovery

| Code | When | Resolution |
|---|---|---|
| `400` | Malformed body, wrong property type, missing prerequisite (e.g. `searchable.tokenization` on a property with no searchable index), `?tenants=` on a single-tenant class, `?tenants=` on a semantic migration, target tenant in `OFFLOADED` / `FROZEN`. | Error responses carry next-step hints — read them. |
| `404` | Class or property doesn't exist. | Verify the class + property names. |
| `409` | An in-flight task overlaps this `(collection, property)`. The error names the offending task ID and migration type. | Wait, or cancel the existing task first. |
| `429` | Per-class in-flight cap reached (32 concurrent migrations). | Retry once in-flight migrations drain. |
| `503` | Cluster service temporarily unavailable. | Retry. |

The schema flag is the source of truth: if a task ends in `failed`, the flag was not flipped and the property is still in its pre-migration state. A reindex is restart-safe at every phase — in-flight migrations are picked up automatically after a node restart.

### Required permissions

| Endpoint | Required permission |
|---|---|
| `GET /v1/schema/{class}/indexes` | `READ` on `CollectionsMetadata` |
| `PUT /v1/schema/{class}/indexes/{property}` | `UPDATE` on `Collections` |
| `DELETE /v1/schema/{class}/properties/{property}/index/{indexName}` | `UPDATE` on `CollectionsMetadata` |

`PUT` is intentionally stricter than the other endpoints: submitting a reindex task rebuilds buckets on every replica and flips schema flags, so it requires `UPDATE` on `Collections` — the same permission that gates `UpdateClass` and replication-factor changes. `DELETE` shares the existing schema-metadata permission (`UPDATE` on `CollectionsMetadata`) used by the other property-management endpoints. There is no dedicated `reindex` role today.

## How Weaviate creates inverted indexes

Weaviate creates **separate inverted indexes for each property and each index type**. For example, if you have a `title` property that is both searchable and filterable,
Expand All @@ -274,9 +365,8 @@ This is caused by the inverted index being built at import time. If you add a pr
To avoid this, you can either:

- Add the property before importing objects.
- Delete the collection, re-create it with the new property and then re-import the data.

We are working on a re-indexing API to allow you to re-index the data after adding a property. This will be available in a future release.
- From `v1.38`, use the [runtime reindex](#runtime-reindex-v138-preview) endpoints to add or change an inverted index on the collection without restart.
- On versions before `v1.38`, delete the collection, re-create it with the new property, and then re-import the data.

## How tokenization affects inverted indexing

Expand Down
5 changes: 2 additions & 3 deletions docs/weaviate/manage-collections/collection-operations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -595,9 +595,8 @@ Property indexes are built at import time. If you add a new property after impor
To create an index that includes all of the objects in a collection, do one of the following:

- New collections: Add all of the collection's properties before importing objects.
- Existing collections: Export the existing data from the collection. Re-create it with the new property. Import the data into the updated collection.

We are working on a re-indexing API to allow you to re-index the data after adding a property. This will be available in a future release.
- Existing collections (v1.38+): Use the runtime [Reindex a property](./inverted-index.mdx#reindex-a-property-on-a-collection-v138) endpoints to add or change inverted indexes on a collection without restart.
- Existing collections (pre-v1.38): Export the existing data from the collection, recreate it with the new property, and re-import the data into the updated collection.

</details>

Expand Down
Loading
Loading