diff --git a/_includes/configuration/bq-compression-parameters.mdx b/_includes/configuration/bq-compression-parameters.mdx
index 9552fb9b..986af342 100644
--- a/_includes/configuration/bq-compression-parameters.mdx
+++ b/_includes/configuration/bq-compression-parameters.mdx
@@ -1,6 +1,6 @@
| Parameter | Type | Default | Details |
| :---------------------- | :------ | :------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `bq` : `enabled` | boolean | `false` | Enable BQ. Weaviate uses binary quantization (BQ) compression when `true`.
The Python client does not use the `enabled` parameter. To enable BQ with the v4 client, set a `quantizer` in the collection definition. |
-| `bq` : `rescoreLimit` | integer | -1 | The minimum number of candidates to fetch before rescoring. |
+| `bq` : `rescoreLimit` | integer | `-1` | The minimum number of candidates to fetch before rescoring. The default of `-1` means Weaviate rescores only the requested number of results (the query `limit`), without additional over-fetching. Set a higher value to over-fetch more compressed candidates and improve recall.
(Only when using the `flat` vector index type. On HNSW, BQ rescoring is automatic and this parameter has no effect.) |
| `bq` : `cache` | boolean | `false` | Whether to cache the vectors in memory.
(only when using the `flat` vector index type) |
| `vectorCacheMaxObjects` | integer | `1e12` | Maximum number of objects in the memory cache. By default, this limit is set to one trillion (`1e12`) objects when a new collection is created. For sizing recommendations, see [Vector cache considerations](/weaviate/concepts/vector-index#vector-cache-considerations). |
diff --git a/_includes/configuration/rq-compression-parameters.mdx b/_includes/configuration/rq-compression-parameters.mdx
index 2e51380f..2cf6129a 100644
--- a/_includes/configuration/rq-compression-parameters.mdx
+++ b/_includes/configuration/rq-compression-parameters.mdx
@@ -1,6 +1,6 @@
| Parameter | Type | Default | Details |
| :---------------------- | :------ | :------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `rq`: `bits` | integer | `8` | The number of bits used to quantize each data point. Value can be `8` or `1`.
Learn more about [8-bit](/weaviate/concepts/vector-quantization#8-bit-rq) and [1-bit](/weaviate/concepts/vector-quantization#1-bit-rq) RQ. |
-| `rq`: `rescoreLimit` | integer | `-1` | The minimum number of candidates to fetch before rescoring. |
+| `rq`: `rescoreLimit` | integer | `20` / `512` | The minimum number of candidates to fetch before rescoring. On an HNSW index, defaults to `20` for 8-bit RQ and `512` for 1-bit RQ. On a flat index, the default is `-1`, which rescores only the requested number of results (the query `limit`) without additional over-fetching. Set a higher value to improve recall. Set to `0` to disable rescoring. |
| `rq` : `cache` | boolean | `false` | Whether to cache the vectors in memory.
(only when using the `flat` vector index type) |
| `vectorCacheMaxObjects` | integer | `1e12` | Maximum number of objects in the memory cache. By default, this limit is set to one trillion (`1e12`) objects when a new collection is created. For sizing recommendations, see [Vector cache considerations](/weaviate/concepts/vector-index#vector-cache-considerations). |
diff --git a/_includes/configuration/sq-compression-parameters.mdx b/_includes/configuration/sq-compression-parameters.mdx
index a49ed00b..a54ccda9 100644
--- a/_includes/configuration/sq-compression-parameters.mdx
+++ b/_includes/configuration/sq-compression-parameters.mdx
@@ -1,6 +1,6 @@
| Parameter | Type | Default | Details |
| :---------------------- | :------ | :------ | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sq`: `enabled` | boolean | `false` | Uses SQ when `true`.
The Python client does not use the `enabled` parameter. To enable SQ with the v4 client, set a `quantizer` in the collection definition. |
-| `sq`: `rescoreLimit` | integer | -1 | The minimum number of candidates to fetch before rescoring. |
+| `sq`: `rescoreLimit` | integer | `20` | The minimum number of candidates to fetch before rescoring. Set to `0` to disable rescoring. |
| `sq`: `trainingLimit` | integer | 100000 | The size of the training set to determine scalar bucket boundaries. |
| `vectorCacheMaxObjects` | integer | `1e12` | Maximum number of objects in the memory cache. By default, this limit is set to one trillion (`1e12`) objects when a new collection is created. For sizing recommendations, see [Vector cache considerations](/weaviate/concepts/vector-index#vector-cache-considerations). |
diff --git a/docs/weaviate/concepts/vector-quantization.md b/docs/weaviate/concepts/vector-quantization.md
index 599de161..70bd3748 100644
--- a/docs/weaviate/concepts/vector-quantization.md
+++ b/docs/weaviate/concepts/vector-quantization.md
@@ -171,18 +171,27 @@ Learn more about how to [configure rotational quantization](../configuration/com
## Over-fetching / re-scoring
-Weaviate over-fetches results and then re-scores them when you use SQ, RQ, or BQ. This is because the distance calculation on the compressed vectors is not as accurate as the same calculation on the original vector embedding.
+All quantization methods in Weaviate use re-scoring to offset the recall loss caused by compression. The distance calculation on compressed vectors is not as accurate as on the original embeddings, so Weaviate fetches the original, uncompressed vectors for the result candidates and recalculates the distances.
-When you run a query, Weaviate compares the query limit against a configurable `rescoreLimit` parameter.
+### Configurable over-fetching (`rescoreLimit`)
-The query retrieves compressed objects until the object count reaches whichever limit is greater. Then, Weaviate fetches the original, uncompressed vector embeddings that correspond to the compressed vectors. The uncompressed vectors are used to recalculate the query distance scores.
+You can configure the amount of over-fetching using the `rescoreLimit` parameter for:
-For example, if a query is made with a limit of 10, and a rescore limit of 200, Weaviate fetches 200 objects. After rescoring, the query returns top 10 objects. This process offsets the loss in search quality (recall) that is caused by compression.
+- **SQ** and **RQ** on an HNSW index
+- **RQ** and **BQ** on a flat index
+
+When you run a query, Weaviate compares the query `limit` against `rescoreLimit` and retrieves compressed objects up to whichever is greater. It then re-scores those candidates using the uncompressed vectors.
+
+For example, if a query has a limit of 10 and a rescore limit of 200, Weaviate fetches 200 objects, re-scores them, and returns the top 10.
:::note RQ optimization
With RQ's high native recall of 98-99%, you can often disable rescoring (set `rescoreLimit` to 0) for maximum query performance with minimal impact on search quality.
:::
+### Automatic over-fetching (PQ, and BQ on HNSW)
+
+**PQ**, and **BQ on an HNSW index**, also perform over-fetching and re-scoring, but they handle it automatically — there is no `rescoreLimit` parameter to configure. During an HNSW search, the compressed vectors are used for the initial graph traversal, and the result candidates are then re-scored with the original uncompressed vectors stored on disk. For more details, see the [PQ rescoring blog post](https://weaviate.io/blog/pq-rescoring).
+
## Vector compression with vector indexing
### With an HNSW index
@@ -199,9 +208,9 @@ You might be also interested in our blog post [HNSW+PQ - Exploring ANN algorithm
## Rescoring
-Quantization inherently involves some loss information due to the reduction in information precision. To mitigate this, Weaviate uses a technique called rescoring, using the uncompressed vectors that are also stored alongside compressed vectors. Rescoring recalculates the distance between the original vectors of the returned candidates from the initial search. This ensures that the most accurate results are returned to the user.
+Quantization inherently involves some loss of information due to the reduction in precision. To mitigate this, all quantization methods (PQ, SQ, RQ, and BQ) use rescoring: Weaviate stores the original uncompressed vectors alongside the compressed ones and recalculates distances from the uncompressed vectors for the result candidates. This ensures that the most accurate results are returned to the user.
-In some cases, rescoring also includes over-fetching, whereby additional candidates are fetched to ensure that the top candidates are not omitted in the initial search.
+Rescoring also includes over-fetching, whereby additional candidates are fetched to ensure that the top results are not missed in the initial compressed search. This over-fetching is configurable via the `rescoreLimit` parameter for SQ and RQ on an HNSW index, and for RQ and BQ on a flat index. PQ (and BQ on an HNSW index) perform over-fetching and rescoring automatically. See [Over-fetching / re-scoring](#over-fetching--re-scoring) for details.
## Further resources
diff --git a/docs/weaviate/starter-guides/managing-resources/compression.mdx b/docs/weaviate/starter-guides/managing-resources/compression.mdx
index f8209da2..f3178fb5 100644
--- a/docs/weaviate/starter-guides/managing-resources/compression.mdx
+++ b/docs/weaviate/starter-guides/managing-resources/compression.mdx
@@ -78,7 +78,7 @@ Typical recall rates:
- RQ: 98-99% recall
- BQ: Varies significantly based on data and model characteristics
-To improve recall with compressed vectors, Weaviate over-fetches a list of candidate vectors during a search. For each item on the candidate list, Weaviate fetches the corresponding uncompressed vector. To determine the final ranking, Weaviate calculates the distances from the uncompressed vectors to the query vector.
+To improve recall with compressed vectors, all quantization methods use re-scoring. Weaviate fetches the original, uncompressed vectors for the result candidates and recalculates distances to determine the final ranking. You can configure over-fetching via the `rescoreLimit` parameter for SQ and RQ on an HNSW index, and for RQ and BQ on a flat index. PQ (and BQ on an HNSW index) handle over-fetching and re-scoring automatically.
import RescoringIllustration from "/docs/weaviate/starter-guides/managing-resources/img/rescore-uncompressed-vectors.png";
@@ -86,7 +86,7 @@ import RescoringIllustration from "/docs/weaviate/starter-guides/managing-resour
The rescoring process is slower than an in-memory search, but since Weaviate only has to search a limited number of uncompressed vectors, the search is still very fast. Most importantly, rescoring with the uncompressed vectors greatly improves recall.
-The search algorithm uses over-fetching and rescoring so that you get the benefits of compression without losing the precision of an uncompressed vector search.
+The search algorithm uses re-scoring (and over-fetching where configured) so that you get the benefits of compression without losing the precision of an uncompressed vector search. For more details, see [Over-fetching / re-scoring](/weaviate/concepts/vector-quantization#over-fetching--re-scoring).
#### Query speed