fix(upsert): skip redundant writes on idempotent upsert_e and upsert_v#888
fix(upsert): skip redundant writes on idempotent upsert_e and upsert_v#888keithf123r wants to merge 3 commits intoHelixDB:mainfrom
Conversation
| @@ -590,6 +604,10 @@ impl<'db, 'arena, 'txn, I: Iterator<Item = Result<TraversalValue<'arena>, GraphE | |||
| vector.properties = Some(map); | |||
| } | |||
| Some(old) => { | |||
| if props.iter().all(|(k, v)| old.get(k) == Some(v)) { | |||
| return Ok(TraversalValue::Vector(vector)); | |||
| } | |||
There was a problem hiding this comment.
Early return skips
put_vector even when query data differs
In the Vector variant (Some(Ok(TraversalValue::Vector(...)))), when the early-return fires (either props.is_empty() with None properties, or all props match old), put_vector is skipped entirely. However, the query parameter — carrying potentially updated vector embedding data — is never applied to vector.data in this branch. This means a caller that passes a different query but identical properties will silently get a no-op: the stored vector data stays unchanged, with no error or indication.
By contrast, the VectorNodeWithoutVectorData path (lines ~762–908) correctly sets vector.data = query before writing, which is exactly why that path always calls put_vector even when properties are unchanged.
If the Vector path is intentionally designed to not update vector embedding data (i.e., query is used only for creation / VectorNodeWithoutVectorData), the early-return optimization is correct but the silent data-skipping is easy to misuse. Consider either:
- Setting
vector.data = querybefore the early-return check (so changed embeddings are always persisted), or - Adding a code comment explicitly documenting that the
Vectorpath never updatesvector.data, so the skip is safe.
The new tests (test_upsert_v_noop_when_all_properties_identical) only pass an identical query when testing the no-op path, so the case of "same props, different embedding" is not exercised.
There was a problem hiding this comment.
by design. 2 paths serve different purposes:
- TraversalValue::Vector: the vector already has its data loaded. This path only updates props
- VectorNodeWithoutVectorData: vector was loaded w/o its embedding data, so it explicitly sets vector.data = query to fill it in
early return doesnt change this. the query parameter was already being ignored
Description
upsert_n already skips the write when properties haven't changed. upsert_e and upsert_v were missing the same check, so repeated upserts with identical data still wrote to disk every time, bloating the database file over time.
This PR adds the same property-equality check to all upsert paths:
Checklist when merging to main
rustfmthelix-cli/Cargo.tomlandhelixdb/Cargo.tomlAdditional Notes
This PR adds the same property-equality check to all upsert paths:
Greptile Summary
This PR extends the idempotent-write optimisation (already present in
upsert_n) toupsert_eandupsert_v, short-circuiting redundant LMDB writes when an upsert's incoming properties are identical to what is already stored. The logic is correct for the edge and the full-vector paths, but there is a meaningful asymmetry in how the threeupsert_vvariants handle embedding data that is worth resolving.upsert_e— two new early returns: one foredge.properties == None && props.is_empty(), one forSome(old)when all incoming props already matchold. Both correctly skip theedges_db.putcall.upsert_v(Vector) — mirrors the above for vectors. However, theVectorbranch never setsvector.data = query, so a caller that passes a different embedding vector but unchanged properties will hit the early return and silently lose the embedding update. TheVectorNodeWithoutVectorDatabranch avoids this by always writing (it setsvector.data = queryfirst), which highlights the inconsistency.upsert_v(VectorNodeWithoutVectorData) — correctly wraps only the property-merge logic in the skip guard, but unconditionally callsput_vectorto persist updated embedding data.VectorNodeWithoutVectorData(noop merge but still writes), and forSome(old)+ emptyprops(vacuous-truth early return).Important Files Changed
upsert_eandupsert_v(Vector) to skip redundant LMDB writes when properties are unchanged; theVectorNodeWithoutVectorDatapath correctly skips only the property merge but still writes for vector data. Main concern: theVectorvariant early return silently skipsput_vectoreven whenquery(new embedding data) differs from the storedvector.data, becausevector.data = queryis never set in that branch.upsert_e(identical props, empty props) andupsert_v(identical props, empty props). Tests are well-structured and correctly assert identity and property preservation. Missing coverage for:VectorNodeWithoutVectorDatanoop-property-merge-but-still-writes path, andSome(old)+ empty props edge case for both ops.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[upsert_e / upsert_v called] --> B{Existing record found?} B -- No --> C[Create new record + write to LMDB] B -- Yes: Vector case --> D{vector.properties?} B -- Yes: Edge case --> E{edge.properties?} B -- Yes: VectorNodeWithoutVectorData --> VND[Set vector.data = query] D -- None --> D1{props.is_empty?} D1 -- Yes --> SKIP1[⏭ Early return — skip put_vector] D1 -- No --> D2[Add secondary indices + create props map] D2 --> WRITE1[put_vector] D -- Some old --> D3{all props match old?} D3 -- Yes --> SKIP2[⏭ Early return — skip put_vector] D3 -- No --> D4[Update secondary indices + merge props] D4 --> WRITE2[put_vector] E -- None --> E1{props.is_empty?} E1 -- Yes --> SKIP3[⏭ Early return — skip edges_db.put] E1 -- No --> E2[Create new props map] E2 --> WRITE3[edges_db.put] E -- Some old --> E3{all props match old?} E3 -- Yes --> SKIP4[⏭ Early return — skip edges_db.put] E3 -- No --> E4[Merge props] E4 --> WRITE4[edges_db.put] VND --> VND2{vector.properties?} VND2 -- None --> VND3{props.is_empty?} VND3 -- Yes --> VND_WRITE[put_vector — always writes] VND3 -- No --> VND4[Add secondary indices + create props map] VND4 --> VND_WRITE VND2 -- Some old --> VND5{all props match old?} VND5 -- Yes --> VND_WRITE VND5 -- No --> VND6[Update secondary indices + merge props] VND6 --> VND_WRITE style SKIP1 fill:#ffd700 style SKIP2 fill:#ffd700 style SKIP3 fill:#ffd700 style SKIP4 fill:#ffd700 style VND_WRITE fill:#90EE90 style WRITE1 fill:#90EE90 style WRITE2 fill:#90EE90 style WRITE3 fill:#90EE90 style WRITE4 fill:#90EE90Last reviewed commit: "fix(upsert): skip re..."