Skip to content

fix(index): drop tombstoned rows from stable row-id optimize filter#6969

Open
kaan-simbe wants to merge 1 commit into
lance-format:mainfrom
SimbeRobotics:fix/btree-stable-rowid-optimize-dedup
Open

fix(index): drop tombstoned rows from stable row-id optimize filter#6969
kaan-simbe wants to merge 1 commit into
lance-format:mainfrom
SimbeRobotics:fix/btree-stable-rowid-optimize-dedup

Conversation

@kaan-simbe
Copy link
Copy Markdown

Summary

optimize_indices on a stable-row-id dataset with a BTREE scalar index corrupts the index after a merge_insert UPDATE (no intervening compaction). The next index-using read panics with:

RowAddrTreeMap::from_sorted_iter called with non-sorted input
rust/lance-index/src/scalar/btree/flat.rs:58

Root cause

With stable row IDs the BTREE stores stable IDs (not physical addresses), and build_stable_row_id_filter (rust/lance/src/index/append.rs) builds the "retain old rows" allow-list from each retained fragment's row-id sequence — but it read the raw sequences without applying deletion vectors.

A merge_insert UPDATE keeps the updated row's stable ID in the old fragment's row-id sequence (the physical row is only tombstoned via the deletion vector) while a fresh copy of that same stable ID is written to a new fragment that lands in the unindexed delta. So during optimize:

  • old entry (old_value, S) — kept, because S is still in the allow-list
  • new entry (new_value, S) — added from the unindexed delta

Both survive the merge, the merged BTREE page holds two rows with stable ID S, and FlatIndex::try_new trips its strictly-sorted invariant on the next read.

Fix

Apply each retained fragment's deletion vector to its row-id sequence (RowIdSequence::mask) before building the allow-list, so updated/deleted stable IDs are excluded. The allow-list only shrinks — no new data is materialized in memory, which avoids the memory hazard raised in the discussion on #6041.

Relation to #6041

#6041 identified the same bug but stalled on its approach (materializing all new row IDs for the filter pass). This takes the opposite direction: prune the old side using deletion vectors that are already loaded per fragment.

Test plan

  • New regression test test_optimize_btree_after_merge_insert_update_with_stable_row_ids reproduces the panic without the fix and passes with it (stable row IDs → BTREE → merge_insert UPDATE → optimize_indices, no compaction → index-using reads return correct counts and each id resolves to exactly one row).
  • Existing test_optimize_btree_keeps_rows_with_stable_row_ids_after_compaction still passes.
  • cargo fmt --all --check, cargo clippy -p lance --tests -- -D warnings.

A merge_insert UPDATE on a stable-row-id dataset keeps the updated row's
stable ID in the old fragment's row-id sequence (only the deletion vector
marks the physical row stale), while a fresh copy of that same stable ID
arrives in the unindexed delta. build_stable_row_id_filter built its retain
allow-list from the raw row-id sequences without applying deletion vectors,
so the stale entry survived optimize_indices alongside the freshly-indexed
copy. The merged BTREE page then held two entries with the same stable ID,
tripping RowAddrTreeMap::from_sorted_iter ("non-sorted input") on the next
index-using read (lance-index/src/scalar/btree/flat.rs:58).

Mask each retained fragment's deletion vector against its row-id sequence
before building the allow-list so updated/deleted stable IDs are excluded.
Adds a regression test covering merge_insert UPDATE + optimize_indices with
no intervening compaction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the bug Something isn't working label May 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 98.03922% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index/append.rs 98.03% 0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@yanghua
Copy link
Copy Markdown
Collaborator

yanghua commented May 28, 2026

@kaan-simbe There is a conflict file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants