Reduce update-path spec-lock contention: hoist preprocessing, relabel unchanged vectors by ofiryanai · Pull Request #10012 · RediSearch/RediSearch

ofiryanai · 2026-06-07T09:40:06Z

Summary

Two changes to the HSET-update indexing path (IndexSpec_UpdateDoc) that shorten how long the spec write lock is held while a document is re-indexed, reducing contention with concurrent searches (which take the read side of the same lock). Targets the common pattern of frequent partial updates to documents that contain large, rarely-changing vector fields.

Changes

1. Hoist preprocessing out of the write lock (document.c, document.h, spec.c)
The per-field preprocessor loop is split out of Document_AddToIndexes into a new Document_Preprocess, and run — together with NewAddDocumentCtx and Document_MakeStringsOwner — before RedisSearchCtx_LockSpecWrite. Preprocessing (tokenization, vector blob copy/normalize, tag splitting, …) is pure per-document CPU work that only writes aCtx-local scratch and reads the main-thread-stable schema, so it doesn't need the lock. Only the actual index mutations (IndexDocument) now run under it.

2. Relabel unchanged vectors instead of delete + re-insert (indexer.c, indexer.h, document.c, spec.c)
Every HSET assigns a new internal doc id, which previously forced every vector to be VecSimIndex_DeleteVector + VecSimIndex_AddVector into the tiered HNSW index even when the vector value was unchanged — churning the graph (ghost/marked-deleted nodes, repair jobs) and burning background-worker CPU. The command filter already records which hash fields the command modified (aCtx->hashFields); for a vector field not in that set, makeDocumentId now relabels the existing vector from the old doc id to the new one via the new VecSimIndex_RelabelVector API (O(1), no graph churn) and sets skipVectorAdd so the indexer skips re-adding the identical vector. Changed fields — or when the changed set is unknown (filterCommands disabled) — keep the existing delete + re-add path, so behavior is unchanged in that case.

This implements the long-standing // TODO: use VecSimReplace in makeDocumentId.

Dependency

Requires the new VecSimIndex_RelabelVector API: RedisAI/VectorSimilarity#978. This PR bumps the deps/VectorSimilarity submodule to that branch (8ec72b63); it must be repointed to the merged commit before this is mergeable.

Validation

The VecSim API has unit tests (single/multi, flat/HNSW/tiered incl. the pending-insert-job rekey case). The RediSearch changes were not built/tested locally — the local (macOS) build is blocked by the VectorSimilarity/SVS dependency chain — so please rely on CI to build and run the suite. Suggested focused coverage: HSET partial-update flows with vector + text/tag fields under filterCommands, and concurrent search/update.

This PR requires release notes
This PR does not require release notes

🤖 Generated with Claude Code

… unchanged vectors Two changes to the HSET-update indexing path (IndexSpec_UpdateDoc) that shorten how long the spec write lock is held while a document is re-indexed, reducing contention with concurrent searches (which take the read side of the same lock). 1. Hoist preprocessing out of the write lock. Split the per-field preprocessor loop out of Document_AddToIndexes into a new Document_Preprocess, and run it (plus NewAddDocumentCtx and Document_MakeStringsOwner) BEFORE taking the spec write lock. Preprocessing -- tokenization, vector blob copy/normalize, tag splitting, etc. -- is pure per-document CPU work that only writes aCtx-local scratch and reads the (main-thread-stable) schema, so it does not need the lock. Only the actual index mutations (IndexDocument) now run under it. 2. Relabel unchanged vectors instead of delete + re-insert. Every HSET assigns a new internal doc id, which previously forced every vector to be deleted from and re-inserted into the (tiered) HNSW index even when the vector value was unchanged -- churning the graph and burning background-worker CPU. The command filter already records which hash fields the command modified; for a vector field NOT in that set, makeDocumentId now relabels the existing vector from the old doc id to the new one via the new VecSimIndex_RelabelVector API (O(1), no graph churn) and marks the field (skipVectorAdd) so the indexer skips re-adding the identical vector. Fields that changed -- or when the changed set is unknown (filter disabled) -- keep the existing delete + re-add path. This requires VecSimIndex_RelabelVector and bumps the VectorSimilarity submodule to the branch carrying it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jit-ci · 2026-06-07T09:42:35Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

The command filter (search-partial-indexed-docs) opened the key only to check it is an existing hash before capturing the modified field names. That is a keyspace lookup plus a key-string allocation on every hash-write command, including writes to keys no index covers. It is unnecessary: hashFields is consumed only when the matching keyspace notification fires (so the command really did modify a hash), and freeHashFields() clears any stale capture at the top of the next hash-write. A new-key HSET now goes through the changed-field gate instead of a forced full reindex, which is correct and slightly cheaper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

With PARTIAL_INDEXED_DOCS enabled, exercises the relabel path end-to-end: an HSET that changes only non-vector fields must keep the vector intact (KNN still returns the doc at distance 0), while an HSET that changes the vector must reflect the new one. Also covers the new-key gating after dropping the filter's OpenKey, and DEL. Runs over HNSW (tiered) and FLAT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

FT.SEARCH returned the document's vector blob, which the decode_responses test client failed to UTF-8-decode (UnicodeDecodeError). RETURN only the distance and the text field. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-06-07T12:36:13Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions Bot added the size:M label Jun 7, 2026

ofiryanai and others added 3 commits June 7, 2026 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce update-path spec-lock contention: hoist preprocessing, relabel unchanged vectors#10012

Reduce update-path spec-lock contention: hoist preprocessing, relabel unchanged vectors#10012
ofiryanai wants to merge 4 commits into
8.4from
ofir-partial-update-lock

ofiryanai commented Jun 7, 2026

Uh oh!

jit-ci Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ofiryanai commented Jun 7, 2026

Summary

Changes

Dependency

Validation

Uh oh!

jit-ci Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ Jit Security Scan Results

Uh oh!

sonarqubecloud Bot commented Jun 7, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jit-ci Bot commented Jun 7, 2026 •

edited

Loading