From 59d70a4ce0859003ee5487f80c4b3c7bd4b234c6 Mon Sep 17 00:00:00 2001 From: Ivan Despot <66276597+g-despot@users.noreply.github.com> Date: Tue, 7 Apr 2026 14:52:59 +0200 Subject: [PATCH 1/4] Update rescoring explanation --- docs/weaviate/concepts/vector-quantization.md | 16 ++++++++++------ .../managing-resources/compression.mdx | 4 ++-- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/docs/weaviate/concepts/vector-quantization.md b/docs/weaviate/concepts/vector-quantization.md index 599de161..d13c4ab9 100644 --- a/docs/weaviate/concepts/vector-quantization.md +++ b/docs/weaviate/concepts/vector-quantization.md @@ -171,18 +171,22 @@ Learn more about how to [configure rotational quantization](../configuration/com ## Over-fetching / re-scoring -Weaviate over-fetches results and then re-scores them when you use SQ, RQ, or BQ. This is because the distance calculation on the compressed vectors is not as accurate as the same calculation on the original vector embedding. +All quantization methods in Weaviate use re-scoring to offset the recall loss caused by compression. The distance calculation on compressed vectors is not as accurate as on the original embeddings, so Weaviate fetches the original, uncompressed vectors for the result candidates and recalculates the distances. -When you run a query, Weaviate compares the query limit against a configurable `rescoreLimit` parameter. +### SQ, RQ, and BQ -The query retrieves compressed objects until the object count reaches whichever limit is greater. Then, Weaviate fetches the original, uncompressed vector embeddings that correspond to the compressed vectors. The uncompressed vectors are used to recalculate the query distance scores. +With SQ, RQ, and BQ, you can configure the amount of over-fetching using the `rescoreLimit` parameter. When you run a query, Weaviate compares the query `limit` against `rescoreLimit` and retrieves compressed objects up to whichever is greater. It then re-scores those candidates using the uncompressed vectors. -For example, if a query is made with a limit of 10, and a rescore limit of 200, Weaviate fetches 200 objects. After rescoring, the query returns top 10 objects. This process offsets the loss in search quality (recall) that is caused by compression. +For example, if a query has a limit of 10 and a rescore limit of 200, Weaviate fetches 200 objects, re-scores them, and returns the top 10. :::note RQ optimization With RQ's high native recall of 98-99%, you can often disable rescoring (set `rescoreLimit` to 0) for maximum query performance with minimal impact on search quality. ::: +### PQ + +PQ also performs over-fetching and re-scoring, but it handles this automatically — there is no `rescoreLimit` parameter to configure. During an HNSW search, PQ uses compressed vectors for the initial graph traversal and then re-scores the result candidates with the original uncompressed vectors stored on disk. For more details, see the [PQ rescoring blog post](https://weaviate.io/blog/pq-rescoring). + ## Vector compression with vector indexing ### With an HNSW index @@ -199,9 +203,9 @@ You might be also interested in our blog post [HNSW+PQ - Exploring ANN algorithm ## Rescoring -Quantization inherently involves some loss information due to the reduction in information precision. To mitigate this, Weaviate uses a technique called rescoring, using the uncompressed vectors that are also stored alongside compressed vectors. Rescoring recalculates the distance between the original vectors of the returned candidates from the initial search. This ensures that the most accurate results are returned to the user. +Quantization inherently involves some loss of information due to the reduction in precision. To mitigate this, all quantization methods (PQ, SQ, RQ, and BQ) use rescoring: Weaviate stores the original uncompressed vectors alongside the compressed ones and recalculates distances from the uncompressed vectors for the result candidates. This ensures that the most accurate results are returned to the user. -In some cases, rescoring also includes over-fetching, whereby additional candidates are fetched to ensure that the top candidates are not omitted in the initial search. +With SQ, RQ, and BQ, rescoring also includes configurable over-fetching via the `rescoreLimit` parameter, whereby additional candidates are fetched to ensure that the top results are not missed in the initial compressed search. PQ performs over-fetching and rescoring automatically. See [Over-fetching / re-scoring](#over-fetching--re-scoring) for details. ## Further resources diff --git a/docs/weaviate/starter-guides/managing-resources/compression.mdx b/docs/weaviate/starter-guides/managing-resources/compression.mdx index f8209da2..5b04e733 100644 --- a/docs/weaviate/starter-guides/managing-resources/compression.mdx +++ b/docs/weaviate/starter-guides/managing-resources/compression.mdx @@ -78,7 +78,7 @@ Typical recall rates: - RQ: 98-99% recall - BQ: Varies significantly based on data and model characteristics -To improve recall with compressed vectors, Weaviate over-fetches a list of candidate vectors during a search. For each item on the candidate list, Weaviate fetches the corresponding uncompressed vector. To determine the final ranking, Weaviate calculates the distances from the uncompressed vectors to the query vector. +To improve recall with compressed vectors, all quantization methods use re-scoring. Weaviate fetches the original, uncompressed vectors for the result candidates and recalculates distances to determine the final ranking. With SQ, RQ, and BQ, you can also configure over-fetching via the `rescoreLimit` parameter to retrieve additional candidates. PQ handles over-fetching and re-scoring automatically. import RescoringIllustration from "/docs/weaviate/starter-guides/managing-resources/img/rescore-uncompressed-vectors.png"; @@ -86,7 +86,7 @@ import RescoringIllustration from "/docs/weaviate/starter-guides/managing-resour The rescoring process is slower than an in-memory search, but since Weaviate only has to search a limited number of uncompressed vectors, the search is still very fast. Most importantly, rescoring with the uncompressed vectors greatly improves recall. -The search algorithm uses over-fetching and rescoring so that you get the benefits of compression without losing the precision of an uncompressed vector search. +The search algorithm uses re-scoring (and over-fetching where configured) so that you get the benefits of compression without losing the precision of an uncompressed vector search. For more details, see [Over-fetching / re-scoring](/weaviate/concepts/vector-quantization#over-fetching--re-scoring). #### Query speed From 788c7291f2d409600c9ec8f3a18cd4ff3c9f0208 Mon Sep 17 00:00:00 2001 From: Ivan Despot <66276597+g-despot@users.noreply.github.com> Date: Sun, 31 May 2026 18:36:53 +0200 Subject: [PATCH 2/4] Fix wording --- docs/weaviate/concepts/vector-quantization.md | 15 ++++++++++----- .../managing-resources/compression.mdx | 2 +- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/docs/weaviate/concepts/vector-quantization.md b/docs/weaviate/concepts/vector-quantization.md index d13c4ab9..70bd3748 100644 --- a/docs/weaviate/concepts/vector-quantization.md +++ b/docs/weaviate/concepts/vector-quantization.md @@ -173,9 +173,14 @@ Learn more about how to [configure rotational quantization](../configuration/com All quantization methods in Weaviate use re-scoring to offset the recall loss caused by compression. The distance calculation on compressed vectors is not as accurate as on the original embeddings, so Weaviate fetches the original, uncompressed vectors for the result candidates and recalculates the distances. -### SQ, RQ, and BQ +### Configurable over-fetching (`rescoreLimit`) -With SQ, RQ, and BQ, you can configure the amount of over-fetching using the `rescoreLimit` parameter. When you run a query, Weaviate compares the query `limit` against `rescoreLimit` and retrieves compressed objects up to whichever is greater. It then re-scores those candidates using the uncompressed vectors. +You can configure the amount of over-fetching using the `rescoreLimit` parameter for: + +- **SQ** and **RQ** on an HNSW index +- **RQ** and **BQ** on a flat index + +When you run a query, Weaviate compares the query `limit` against `rescoreLimit` and retrieves compressed objects up to whichever is greater. It then re-scores those candidates using the uncompressed vectors. For example, if a query has a limit of 10 and a rescore limit of 200, Weaviate fetches 200 objects, re-scores them, and returns the top 10. @@ -183,9 +188,9 @@ For example, if a query has a limit of 10 and a rescore limit of 200, Weaviate f With RQ's high native recall of 98-99%, you can often disable rescoring (set `rescoreLimit` to 0) for maximum query performance with minimal impact on search quality. ::: -### PQ +### Automatic over-fetching (PQ, and BQ on HNSW) -PQ also performs over-fetching and re-scoring, but it handles this automatically — there is no `rescoreLimit` parameter to configure. During an HNSW search, PQ uses compressed vectors for the initial graph traversal and then re-scores the result candidates with the original uncompressed vectors stored on disk. For more details, see the [PQ rescoring blog post](https://weaviate.io/blog/pq-rescoring). +**PQ**, and **BQ on an HNSW index**, also perform over-fetching and re-scoring, but they handle it automatically — there is no `rescoreLimit` parameter to configure. During an HNSW search, the compressed vectors are used for the initial graph traversal, and the result candidates are then re-scored with the original uncompressed vectors stored on disk. For more details, see the [PQ rescoring blog post](https://weaviate.io/blog/pq-rescoring). ## Vector compression with vector indexing @@ -205,7 +210,7 @@ You might be also interested in our blog post [HNSW+PQ - Exploring ANN algorithm Quantization inherently involves some loss of information due to the reduction in precision. To mitigate this, all quantization methods (PQ, SQ, RQ, and BQ) use rescoring: Weaviate stores the original uncompressed vectors alongside the compressed ones and recalculates distances from the uncompressed vectors for the result candidates. This ensures that the most accurate results are returned to the user. -With SQ, RQ, and BQ, rescoring also includes configurable over-fetching via the `rescoreLimit` parameter, whereby additional candidates are fetched to ensure that the top results are not missed in the initial compressed search. PQ performs over-fetching and rescoring automatically. See [Over-fetching / re-scoring](#over-fetching--re-scoring) for details. +Rescoring also includes over-fetching, whereby additional candidates are fetched to ensure that the top results are not missed in the initial compressed search. This over-fetching is configurable via the `rescoreLimit` parameter for SQ and RQ on an HNSW index, and for RQ and BQ on a flat index. PQ (and BQ on an HNSW index) perform over-fetching and rescoring automatically. See [Over-fetching / re-scoring](#over-fetching--re-scoring) for details. ## Further resources diff --git a/docs/weaviate/starter-guides/managing-resources/compression.mdx b/docs/weaviate/starter-guides/managing-resources/compression.mdx index 5b04e733..f3178fb5 100644 --- a/docs/weaviate/starter-guides/managing-resources/compression.mdx +++ b/docs/weaviate/starter-guides/managing-resources/compression.mdx @@ -78,7 +78,7 @@ Typical recall rates: - RQ: 98-99% recall - BQ: Varies significantly based on data and model characteristics -To improve recall with compressed vectors, all quantization methods use re-scoring. Weaviate fetches the original, uncompressed vectors for the result candidates and recalculates distances to determine the final ranking. With SQ, RQ, and BQ, you can also configure over-fetching via the `rescoreLimit` parameter to retrieve additional candidates. PQ handles over-fetching and re-scoring automatically. +To improve recall with compressed vectors, all quantization methods use re-scoring. Weaviate fetches the original, uncompressed vectors for the result candidates and recalculates distances to determine the final ranking. You can configure over-fetching via the `rescoreLimit` parameter for SQ and RQ on an HNSW index, and for RQ and BQ on a flat index. PQ (and BQ on an HNSW index) handle over-fetching and re-scoring automatically. import RescoringIllustration from "/docs/weaviate/starter-guides/managing-resources/img/rescore-uncompressed-vectors.png"; From 0b9835b8aba68cc84f64022aacfa34bcc03b7e6e Mon Sep 17 00:00:00 2001 From: Ivan Despot <66276597+g-despot@users.noreply.github.com> Date: Mon, 1 Jun 2026 10:10:00 +0200 Subject: [PATCH 3/4] Correct rescoreLimit defaults in compression parameter tables SQ defaults to 20 and RQ to 20/512 on HNSW (not -1, which is the flat sentinel). Clarify what the -1 flat default does, and note that BQ rescoreLimit only applies to flat indexes. --- _includes/configuration/bq-compression-parameters.mdx | 2 +- _includes/configuration/rq-compression-parameters.mdx | 2 +- _includes/configuration/sq-compression-parameters.mdx | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_includes/configuration/bq-compression-parameters.mdx b/_includes/configuration/bq-compression-parameters.mdx index 9552fb9b..8dafee4e 100644 --- a/_includes/configuration/bq-compression-parameters.mdx +++ b/_includes/configuration/bq-compression-parameters.mdx @@ -1,6 +1,6 @@ | Parameter | Type | Default | Details | | :---------------------- | :------ | :------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `bq` : `enabled` | boolean | `false` | Enable BQ. Weaviate uses binary quantization (BQ) compression when `true`.

The Python client does not use the `enabled` parameter. To enable BQ with the v4 client, set a `quantizer` in the collection definition. | -| `bq` : `rescoreLimit` | integer | -1 | The minimum number of candidates to fetch before rescoring. | +| `bq` : `rescoreLimit` | integer | `-1` | The minimum number of candidates to fetch before rescoring. The default of `-1` means Weaviate rescores only the requested number of results (the query `limit`), without additional over-fetching; set a higher value to over-fetch more compressed candidates and improve recall.
(only when using the `flat` vector index type; on HNSW, BQ rescoring is automatic and this parameter has no effect) | | `bq` : `cache` | boolean | `false` | Whether to cache the vectors in memory.
(only when using the `flat` vector index type) | | `vectorCacheMaxObjects` | integer | `1e12` | Maximum number of objects in the memory cache. By default, this limit is set to one trillion (`1e12`) objects when a new collection is created. For sizing recommendations, see [Vector cache considerations](/weaviate/concepts/vector-index#vector-cache-considerations). | diff --git a/_includes/configuration/rq-compression-parameters.mdx b/_includes/configuration/rq-compression-parameters.mdx index 2e51380f..2cf6129a 100644 --- a/_includes/configuration/rq-compression-parameters.mdx +++ b/_includes/configuration/rq-compression-parameters.mdx @@ -1,6 +1,6 @@ | Parameter | Type | Default | Details | | :---------------------- | :------ | :------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `rq`: `bits` | integer | `8` | The number of bits used to quantize each data point. Value can be `8` or `1`.

Learn more about [8-bit](/weaviate/concepts/vector-quantization#8-bit-rq) and [1-bit](/weaviate/concepts/vector-quantization#1-bit-rq) RQ. | -| `rq`: `rescoreLimit` | integer | `-1` | The minimum number of candidates to fetch before rescoring. | +| `rq`: `rescoreLimit` | integer | `20` / `512` | The minimum number of candidates to fetch before rescoring. On an HNSW index, defaults to `20` for 8-bit RQ and `512` for 1-bit RQ. On a flat index, the default is `-1`, which rescores only the requested number of results (the query `limit`) without additional over-fetching. Set a higher value to improve recall. Set to `0` to disable rescoring. | | `rq` : `cache` | boolean | `false` | Whether to cache the vectors in memory.
(only when using the `flat` vector index type) | | `vectorCacheMaxObjects` | integer | `1e12` | Maximum number of objects in the memory cache. By default, this limit is set to one trillion (`1e12`) objects when a new collection is created. For sizing recommendations, see [Vector cache considerations](/weaviate/concepts/vector-index#vector-cache-considerations). | diff --git a/_includes/configuration/sq-compression-parameters.mdx b/_includes/configuration/sq-compression-parameters.mdx index a49ed00b..a54ccda9 100644 --- a/_includes/configuration/sq-compression-parameters.mdx +++ b/_includes/configuration/sq-compression-parameters.mdx @@ -1,6 +1,6 @@ | Parameter | Type | Default | Details | | :---------------------- | :------ | :------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `sq`: `enabled` | boolean | `false` | Uses SQ when `true`.

The Python client does not use the `enabled` parameter. To enable SQ with the v4 client, set a `quantizer` in the collection definition. | -| `sq`: `rescoreLimit` | integer | -1 | The minimum number of candidates to fetch before rescoring. | +| `sq`: `rescoreLimit` | integer | `20` | The minimum number of candidates to fetch before rescoring. Set to `0` to disable rescoring. | | `sq`: `trainingLimit` | integer | 100000 | The size of the training set to determine scalar bucket boundaries. | | `vectorCacheMaxObjects` | integer | `1e12` | Maximum number of objects in the memory cache. By default, this limit is set to one trillion (`1e12`) objects when a new collection is created. For sizing recommendations, see [Vector cache considerations](/weaviate/concepts/vector-index#vector-cache-considerations). | From 8001d7ea912df36faa38f4a5642291a2fc6eb553 Mon Sep 17 00:00:00 2001 From: Ivan Despot <66276597+g-despot@users.noreply.github.com> Date: Mon, 1 Jun 2026 10:15:05 +0200 Subject: [PATCH 4/4] Fix typo --- _includes/configuration/bq-compression-parameters.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_includes/configuration/bq-compression-parameters.mdx b/_includes/configuration/bq-compression-parameters.mdx index 8dafee4e..986af342 100644 --- a/_includes/configuration/bq-compression-parameters.mdx +++ b/_includes/configuration/bq-compression-parameters.mdx @@ -1,6 +1,6 @@ | Parameter | Type | Default | Details | | :---------------------- | :------ | :------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `bq` : `enabled` | boolean | `false` | Enable BQ. Weaviate uses binary quantization (BQ) compression when `true`.

The Python client does not use the `enabled` parameter. To enable BQ with the v4 client, set a `quantizer` in the collection definition. | -| `bq` : `rescoreLimit` | integer | `-1` | The minimum number of candidates to fetch before rescoring. The default of `-1` means Weaviate rescores only the requested number of results (the query `limit`), without additional over-fetching; set a higher value to over-fetch more compressed candidates and improve recall.
(only when using the `flat` vector index type; on HNSW, BQ rescoring is automatic and this parameter has no effect) | +| `bq` : `rescoreLimit` | integer | `-1` | The minimum number of candidates to fetch before rescoring. The default of `-1` means Weaviate rescores only the requested number of results (the query `limit`), without additional over-fetching. Set a higher value to over-fetch more compressed candidates and improve recall.
(Only when using the `flat` vector index type. On HNSW, BQ rescoring is automatic and this parameter has no effect.) | | `bq` : `cache` | boolean | `false` | Whether to cache the vectors in memory.
(only when using the `flat` vector index type) | | `vectorCacheMaxObjects` | integer | `1e12` | Maximum number of objects in the memory cache. By default, this limit is set to one trillion (`1e12`) objects when a new collection is created. For sizing recommendations, see [Vector cache considerations](/weaviate/concepts/vector-index#vector-cache-considerations). |