feat: RDB persistence, virtual keys, and module wiring#384
Conversation
- redis_type.rs: full RDB encode/decode with virtual key splitting, GRAPHMETA_TYPE for C FalkorDB compat, on_persistence event handling - lib.rs: register GRAPHMETA_TYPE, GRAPH.EFFECT, GRAPH.DEBUG commands, serializers module - module_init.rs: subscribe to RedisModuleEvent_Persistence - debug.rs: GRAPH.DEBUG AUX START/END for testing RDB load boundaries - query.rs: add version check on GRAPH.QUERY for schema staleness - commands/mod.rs: register debug module - test_rdb_compat.py: C-to-Rust and Rust-to-C replication via Docker - test_rdb_load.py: rewritten for live data + DEBUG AUX - test_persistency.py: update expectations for new behavior - Remove test_prev_rdb_decode.py and old dump files
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds RDB persistence machinery and virtual-key lifecycle for multi-key graph saves/loads, threads pre-built attribute snapshots into RDB encoding, adds debug/versioning commands, updates query version checks, and introduces extensive RDB compatibility and load tests. Changes
Sequence Diagram(s)sequenceDiagram
actor Redis as Redis Server
participant Persist as Persistence Handler
participant VKey as VirtualKeyManager
participant Graph as Graph Instance
participant Encoder as RDB Encoder
participant RDB as RDB Storage
Redis->>Persist: Persistence start event
Persist->>VKey: create_virtual_keys(ctx)
VKey->>Redis: create virtual graphmeta keys
Graph->>Graph: needs_rdb_snapshot()
alt needs snapshot
Graph->>Graph: build_rdb_snapshots()
end
Encoder->>Graph: rdb_save_graph(snapshots?)
Graph->>Encoder: encode_payload(with snapshots)
Encoder->>RDB: write graphdata and graphmeta entries
RDB-->>Persist: RDB write complete
Persist->>VKey: delete_stale_virtual_keys(ctx)
VKey->>Redis: remove temporary virtual keys
sequenceDiagram
actor Client as Redis Client
participant Cmd as Query Handler
participant Graph as Graph Storage
Client->>Cmd: GRAPH.QUERY ... VERSION <n>
Cmd->>Graph: read current schema_version
Graph-->>Cmd: current_version
Cmd->>Cmd: compare provided vs current
alt match
Cmd-->>Client: execute query and return results
else mismatch
Cmd-->>Client: RESP array [ "ERR invalid graph version", current_version ]
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| avoiding DUMP/RESTORE version-checksum mismatches. | ||
| """ | ||
|
|
||
| import os |
| import os | ||
| import time | ||
| from random import randint, seed | ||
| from common import * |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #384 +/- ##
==========================================
+ Coverage 78.68% 84.94% +6.26%
==========================================
Files 118 119 +1
Lines 32519 33212 +693
==========================================
+ Hits 25586 28211 +2625
+ Misses 6933 5001 -1932
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flow_tests_done.txt (1)
17-61:⚠️ Potential issue | 🟠 MajorAdd
tests/flow/test_rdb_compat.pytoflow_tests_done.txt.The new RDB compatibility test file exists but is missing from this manifest. Since
flow.shusesflow_tests_done.txtto enumerate runnable flow tests, the new test coverage won't execute in CI.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flow_tests_done.txt` around lines 17 - 61, Add the missing test entry for the new RDB compatibility test to the test manifest: open flow_tests_done.txt and add a line containing tests/flow/test_rdb_compat.py so the flow.sh enumerator picks up and runs the new test; ensure the line follows the same formatting as the other test entries.
🧹 Nitpick comments (3)
tests/flow/test_persistency.py (2)
109-111: Track the TODO for re-enabling this test.The test is skipped pending index completion. Consider creating a tracking issue to ensure this test is re-enabled once the index implementation is complete.
Would you like me to open an issue to track re-enabling
test_save_loadafter indexes are completed?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/flow/test_persistency.py` around lines 109 - 111, Create a tracking issue for re-enabling the skipped test by opening an issue that references the skipped test function test_save_load and the `@skip`() decorator in tests/flow/test_persistency.py; include context explaining it is skipped "pending index completion", link to any relevant index implementation PRs or tickets, add an assignee or label (eg. "needs-indexes" or "tests") and update the test file TODO comment to include the new issue number so the skip is easy to find and unblock later.
275-277: Track the TODO for re-enabling this test.The test is skipped pending bulk loader implementation. Consider creating a tracking issue to ensure this test is re-enabled once the bulk loader is implemented.
Would you like me to open an issue to track re-enabling
test_bulk_insertwhen the bulk loader is implemented?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/flow/test_persistency.py` around lines 275 - 277, Create a tracking issue in your issue tracker for re-enabling the skipped test_bulk_insert and then update the test code: open an issue referencing "test_bulk_insert" and the missing bulk loader feature, paste the new issue ID or URL into the TODO comment above the test, and replace the bare `@skip`() with a skip that includes the issue reference in its reason (e.g., `@skip`(reason="pending bulk loader - ISSUE-123")). This ensures the TODO is tracked and the skip clearly points to the follow-up task.tests/flow/test_rdb_compat.py (1)
18-29: Consider adding timeout/retry for port binding.Using
randint(49152, 65535)for port selection could occasionally collide with a port already in use. The current implementation would fail with a Docker error in that case.If port collisions become an issue in CI, consider either:
- Letting Docker assign a random port (
ports={'6379/tcp': None}) and reading the assigned port back- Adding retry logic with a different port on failure
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/flow/test_rdb_compat.py` around lines 18 - 29, The run_db function currently picks a random port with randint(49152, 65535) which can collide and cause the Docker container creation to fail; change run_db to either (A) let Docker assign a random host port by using ports={'6379/tcp': None} and then read the mapped port from the created container (container.attrs['NetworkSettings']['Ports']['6379/tcp'][0]['HostPort']) or (B) wrap the client.containers.run call in retry logic that catches docker.errors.APIError (or the relevant exception) and retries with a new random port a limited number of times before failing; update the return to still provide the container and resolved port either way.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/redis_type.rs`:
- Around line 780-791: The save/load pair for graphmeta is asymmetric:
graphmeta_rdb_save writes nothing but graphmeta_rdb_load always expects a graph
payload, so a stale key surviving save will break restore; fix by making them
symmetric — modify graphmeta_rdb_save to emit a simple explicit sentinel entry
(e.g., a zero-length marker or specific version/tag payload) that
graphmeta_rdb_load recognizes as “empty/stale” and returns no graph, or
alternatively update graphmeta_rdb_load to accept and treat an empty/absent
payload as the same sentinel; update the loader logic in graphmeta_rdb_load to
check for and handle that sentinel (or a zero-length read) and return early
without error.
- Around line 428-444: The SCAN reply handling uses unsafe UTF-8 conversions:
replace uses of std::str::from_utf8_unchecked for the cursor (new_cursor) and
for key_name (result of RedisModule_CallReplyStringPtr) with a binary-safe
conversion such as std::str::from_utf8_lossy() or work with the raw bytes
(Vec<u8>/ &[u8]) instead; update the code paths around call_reply_array_element
and the RedisModule_CallReplyStringPtr usage so comparisons and string creation
use the lossless/binary-safe representation rather than unchecked UTF-8.
---
Outside diff comments:
In `@flow_tests_done.txt`:
- Around line 17-61: Add the missing test entry for the new RDB compatibility
test to the test manifest: open flow_tests_done.txt and add a line containing
tests/flow/test_rdb_compat.py so the flow.sh enumerator picks up and runs the
new test; ensure the line follows the same formatting as the other test entries.
---
Nitpick comments:
In `@tests/flow/test_persistency.py`:
- Around line 109-111: Create a tracking issue for re-enabling the skipped test
by opening an issue that references the skipped test function test_save_load and
the `@skip`() decorator in tests/flow/test_persistency.py; include context
explaining it is skipped "pending index completion", link to any relevant index
implementation PRs or tickets, add an assignee or label (eg. "needs-indexes" or
"tests") and update the test file TODO comment to include the new issue number
so the skip is easy to find and unblock later.
- Around line 275-277: Create a tracking issue in your issue tracker for
re-enabling the skipped test_bulk_insert and then update the test code: open an
issue referencing "test_bulk_insert" and the missing bulk loader feature, paste
the new issue ID or URL into the TODO comment above the test, and replace the
bare `@skip`() with a skip that includes the issue reference in its reason (e.g.,
`@skip`(reason="pending bulk loader - ISSUE-123")). This ensures the TODO is
tracked and the skip clearly points to the follow-up task.
In `@tests/flow/test_rdb_compat.py`:
- Around line 18-29: The run_db function currently picks a random port with
randint(49152, 65535) which can collide and cause the Docker container creation
to fail; change run_db to either (A) let Docker assign a random host port by
using ports={'6379/tcp': None} and then read the mapped port from the created
container
(container.attrs['NetworkSettings']['Ports']['6379/tcp'][0]['HostPort']) or (B)
wrap the client.containers.run call in retry logic that catches
docker.errors.APIError (or the relevant exception) and retries with a new random
port a limited number of times before failing; update the return to still
provide the container and resolved port either way.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9ca63d8f-2ace-4fb8-b612-69591acb4503
📒 Files selected for processing (22)
flow_tests_done.txtflow_tests_todo.txtgraph/src/graph/graph.rsgraph/src/runtime/pending.rssrc/commands/debug.rssrc/commands/effect.rssrc/commands/mod.rssrc/commands/query.rssrc/lib.rssrc/module_init.rssrc/redis_type.rstests/flow/dumps/10.dumptests/flow/dumps/11.dumptests/flow/dumps/12.dumptests/flow/dumps/13.dumptests/flow/dumps/14.dumptests/flow/dumps/15.dumptests/flow/test_persistency.pytests/flow/test_prev_rdb_decode.pytests/flow/test_rdb_compat.pytests/flow/test_rdb_load.pytests/flow/test_replication.py
💤 Files with no reviewable changes (2)
- flow_tests_todo.txt
- tests/flow/test_prev_rdb_decode.py
| if let Some(graph) = read_key.get_value::<Arc<RwLock<ThreadedGraph>>>(&GRAPH_TYPE)? { | ||
| let graph = graph.clone(); | ||
|
|
||
| // Check version if provided | ||
| if let Some(provided_version) = version_check { | ||
| let current_schema_version = graph.read().graph.read().borrow().schema_version; | ||
| if provided_version != current_schema_version { | ||
| drop(read_key); | ||
| drop(graph); | ||
| // Return array with [error, version] | ||
| raw::reply_with_array(ctx.ctx, 2); | ||
| let err_msg = CString::new("ERR invalid graph version").unwrap(); | ||
| raw::reply_with_error(ctx.ctx, err_msg.as_ptr()); | ||
| raw::reply_with_long_long(ctx.ctx, current_schema_version as i64); | ||
| return Ok(redis_module::RedisValue::NoReply); | ||
| } | ||
| } |
There was a problem hiding this comment.
Move the schema-version check into the serialized execution path.
These comparisons run before query_mut(...), so another writer can change the schema between the check and the actual query execution. That leaves a TOCTOU window where stale clients still get through, or valid clients are checked against an older snapshot. Validate version_check after the request has entered the same queue/lock that executes the query.
Based on learnings, Redis GRAPH.QUERY command must accept both read and write Cypher queries and block client during execution.
Also applies to: 98-112
| let new_cursor = std::str::from_utf8_unchecked(std::slice::from_raw_parts( | ||
| cursor_ptr.cast(), | ||
| cursor_len, | ||
| )); | ||
| let done = new_cursor == "0"; | ||
|
|
||
| // Get keys array. | ||
| let arr_reply = raw::call_reply_array_element(reply, 1); | ||
| let arr_len = raw::call_reply_length(arr_reply); | ||
|
|
||
| for i in 0..arr_len { | ||
| let elem = raw::call_reply_array_element(arr_reply, i); | ||
| let mut key_len: usize = 0; | ||
| let kptr = raw::RedisModule_CallReplyStringPtr.unwrap()(elem, &raw mut key_len); | ||
| let key_name = | ||
| std::str::from_utf8_unchecked(std::slice::from_raw_parts(kptr.cast(), key_len)) | ||
| .to_string(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the file exists and examine the specific lines mentioned
wc -l src/redis_type.rsRepository: FalkorDB/falkordb-rs-next-gen
Length of output: 94
🏁 Script executed:
# Look at the code around lines 428-444
sed -n '420,450p' src/redis_type.rsRepository: FalkorDB/falkordb-rs-next-gen
Length of output: 1365
🏁 Script executed:
# Look at the code around lines 593-610
sed -n '585,620p' src/redis_type.rsRepository: FalkorDB/falkordb-rs-next-gen
Length of output: 1408
🏁 Script executed:
# Let's also see the full context of these functions
sed -n '400,500p' src/redis_type.rsRepository: FalkorDB/falkordb-rs-next-gen
Length of output: 3867
🌐 Web query:
Redis SCAN command cursor format binary safe keys
💡 Result:
The Redis SCAN command uses a cursor that is an unsigned 64-bit integer (uint64_t), returned and accepted as a binary-safe string in the RESP protocol. This format ensures it is safe for transmission over the network without interpretation issues, even for large values exceeding signed 64-bit integer limits. Clients must handle it as a string to preserve values larger than 2^63-1, avoiding signed integer conversion errors that can cause infinite loops or incorrect iterations. The cursor represents an internal iteration position in Redis's hash table implementation (dict.c), using bit-reversed incrementing for stateless traversal resilient to resizing/rehashing. It starts at "0" and returns to "0" when iteration completes. Keys returned are binary-safe strings matching Redis key encoding. Official documentation confirms: SCAN returns a two-element array where the first element is a bulk string representing the unsigned 64-bit cursor. This applies similarly to SSCAN, HSCAN, ZSCAN.
Citations:
- 1: https://redis.io/docs/latest/commands/scan/
- 2: https://redis.io/docs/latest/commands/scan
- 3: https://redis.io/commands/hscan
- 4: https://redis.com/commands/sscan/
- 5: https://redis.io/commands/hscan/
- 6: http://man.hubwiz.com/docset/Redis.docset/Contents/Resources/Documents/commands/scan.html
- 7: Redis scan cursor returns unsigned int, conversion to signed int causes incorrect cursor on following iteration. phpredis/phpredis#2454
- 8: https://github.com/redis/redis/blob/unstable/src/dict.c
- 9: https://github.com/redis/redis/blob/462e603a/src/db.c
Use safe UTF-8 conversion for binary-safe SCAN reply data.
Redis keys are binary-safe and can contain non-UTF-8 bytes. Using from_utf8_unchecked() on key names from SCAN replies is undefined behavior. The cursor is also technically binary-safe (though Redis returns it as ASCII in practice). Both conversions at lines 428–444 and 593–610 should use from_utf8_lossy() instead, or keep the data as raw bytes if only needed for comparisons.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/redis_type.rs` around lines 428 - 444, The SCAN reply handling uses
unsafe UTF-8 conversions: replace uses of std::str::from_utf8_unchecked for the
cursor (new_cursor) and for key_name (result of RedisModule_CallReplyStringPtr)
with a binary-safe conversion such as std::str::from_utf8_lossy() or work with
the raw bytes (Vec<u8>/ &[u8]) instead; update the code paths around
call_reply_array_element and the RedisModule_CallReplyStringPtr usage so
comparisons and string creation use the lossless/binary-safe representation
rather than unchecked UTF-8.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/redis_type.rs (1)
700-721: UUID generation is functional but non-standard.The custom UUID uses timestamp + counter which provides sufficient uniqueness for virtual key names within a single process. If cross-process uniqueness becomes a concern (e.g., clustered deployments with shared persistence), consider using the
uuidcrate with proper random sources.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/redis_type.rs` around lines 700 - 721, The custom uuid_v4() function produces non-standard UUIDs based on timestamp+counter which can collide across processes; replace it with a proper UUID v4 generator using the uuid crate: remove or stop using the uuid_v4() function and instead call Uuid::new_v4() (from uuid::Uuid) where UUIDs are needed, add the uuid dependency to Cargo.toml with the v4 feature (e.g., uuid = { version = "1", features = ["v4", "std"] } or similar), and update imports to use uuid::Uuid so generated IDs are RFC‑4122 compliant and safe for cross-process/clustered use.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@graph/src/graph/attribute_store.rs`:
- Around line 298-348: preload_from_fjall currently only inserts entities that
have at least one attribute key in the fjall scan, missing zero-attribute
entities; to fix, after the scan compute the maximum entity id seen (or use
whatever tracker you have, e.g. self.next_entity_id/self.next_id) and then
iterate all ids from 1..=max_id (or from min known id to max) and for any id not
present in the cache and not in pending_deletes call
cache.insert_entity_if_older(id, Vec::new(), self.version) to create an
empty-attribute entry; keep the existing scan logic (the loop using
snapshot()/self.keyspace() and insert_entity_if_older for non-empty attrs) but
add this final pass to ensure zero-attribute entities are cached too.
In `@src/redis_type.rs`:
- Around line 133-135: The early silent return when
vkey_state.get_graph_ref(&graph_name) yields None should be replaced with an
explicit error log so state inconsistencies are discoverable: in the block
around vkey_state.get_graph_ref(&graph_name).cloned() (the code creating
graph_arc), emit a descriptive error (including graph_name and the virtual key
identifier / vkey_state context) using the crate's logging/tracing facility
instead of returning silently, then decide whether to continue gracefully or
skip writing this vkey; do not leave the function silently returning without
logging.
---
Nitpick comments:
In `@src/redis_type.rs`:
- Around line 700-721: The custom uuid_v4() function produces non-standard UUIDs
based on timestamp+counter which can collide across processes; replace it with a
proper UUID v4 generator using the uuid crate: remove or stop using the
uuid_v4() function and instead call Uuid::new_v4() (from uuid::Uuid) where UUIDs
are needed, add the uuid dependency to Cargo.toml with the v4 feature (e.g.,
uuid = { version = "1", features = ["v4", "std"] } or similar), and update
imports to use uuid::Uuid so generated IDs are RFC‑4122 compliant and safe for
cross-process/clustered use.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 57f421dd-d84f-4ce9-961a-d3ff225f1323
📒 Files selected for processing (4)
graph/src/graph/attribute_store.rsgraph/src/graph/graph.rssrc/graph_core.rssrc/redis_type.rs
✅ Files skipped from review due to trivial changes (1)
- src/graph_core.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- graph/src/graph/graph.rs
| let Some(graph_arc) = vkey_state.get_graph_ref(&graph_name).cloned() else { | ||
| return; | ||
| }; |
There was a problem hiding this comment.
Silent return may cause data corruption on restore.
If get_graph_ref returns None when a virtual key payload exists, the RDB save writes nothing for this key. On restore, Redis will encounter an empty/malformed key. Consider logging an error to aid debugging state inconsistencies.
🛡️ Proposed fix to add error logging
let Some(graph_arc) = vkey_state.get_graph_ref(&graph_name).cloned() else {
+ eprintln!(
+ "graph rdb_save: missing graph ref for virtual key '{key_name}' (graph '{graph_name}')"
+ );
return;
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let Some(graph_arc) = vkey_state.get_graph_ref(&graph_name).cloned() else { | |
| return; | |
| }; | |
| let Some(graph_arc) = vkey_state.get_graph_ref(&graph_name).cloned() else { | |
| eprintln!( | |
| "graph rdb_save: missing graph ref for virtual key '{key_name}' (graph '{graph_name}')" | |
| ); | |
| return; | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/redis_type.rs` around lines 133 - 135, The early silent return when
vkey_state.get_graph_ref(&graph_name) yields None should be replaced with an
explicit error log so state inconsistencies are discoverable: in the block
around vkey_state.get_graph_ref(&graph_name).cloned() (the code creating
graph_arc), emit a descriptive error (including graph_name and the virtual key
identifier / vkey_state context) using the crate's logging/tracing facility
instead of returning silently, then decide whether to continue gracefully or
skip writing this vkey; do not leave the function silently returning without
logging.
Replace preload_from_fjall() (which loaded into quick_cache and could evict under memory pressure) with build_rdb_snapshot() that returns a HashMap holding ALL entity attributes. Snapshots are built in on_persistence(RDB_START) before Redis forks, stored in VKEY_STATE as Arc<RdbSnapshots>, and passed to encode_with_range in the fork child. This ensures the BGSAVE child never accesses fjall (which is unsafe after fork due to mmapped files and internal state), fixing MISCONF errors in benchmarks and timeouts in coverage builds.
Add `save ""` to tests/flow/redis.conf to prevent Redis from auto-triggering BGSAVE during tests. Without this, Redis's default save rules (60s/10000 keys) fire during test_multi_graph (1000 graphs), causing "Background save already in progress" errors when the test tries to SAVE/dumpAndReload. Also fix rustfmt formatting in attribute_store.rs and graph.rs.
Only build pre-fork attribute snapshots for graphs that have a fjall keyspace initialized (cold data on disk). Graphs that fit entirely in the in-memory cache (e.g. the 1000 small graphs in test_multi_graph) skip snapshot construction entirely, avoiding the overhead of iterating cache entries and allocating HashMaps for each graph during the on_persistence(RDB_START) handler. This fixes the test_multi_graph timeout where building snapshots for 1000 graphs took >10s in CI, causing "Background save already in progress" errors.
When populate_cache_from_fjall is called for an entity with no fjall keyspace, it was inserting an empty entry into the quick_cache. For graphs with many property-less entities (like the 1000-graph test with 1.5M total entities), this caused millions of cache inserts in the BGSAVE fork child, triggering COW page faults and degrading performance. Simply return an empty Arc without the cache insert — there is nothing to cache when no persistent storage exists.
There was a problem hiding this comment.
♻️ Duplicate comments (3)
src/redis_type.rs (3)
809-820:⚠️ Potential issue | 🟠 MajorDon't emit an unreadable
graphmetaRDB payload.
graphmeta_rdb_savewrites nothing, butgraphmeta_rdb_loadcallsrdb_load_graphwhich always tries to decode a full graph payload. If a stalegraphmetakey ever survives cleanup and reaches a save, the next restore will fail. Make the save/load pair symmetric by writing an explicit sentinel that the loader recognizes.🛡️ Proposed fix to add sentinel marker
#[unsafe(no_mangle)] unsafe extern "C" fn graphmeta_rdb_save( _rdb: *mut RedisModuleIO, _value: *mut c_void, ) { - // Stale graphmeta keys should have been deleted before save. - // If we get here, write nothing — the key will be empty. + // Write a sentinel marker so graphmeta_rdb_load can detect stale keys. + // Stale graphmeta keys should have been deleted before save, but this + // provides a safety net. + unsafe { + raw::save_unsigned(_rdb, 0xDEAD_BEEF); // Sentinel for "stale key" + } }Then update
graphmeta_rdb_loadto check for this sentinel before callingrdb_load_graph.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/redis_type.rs` around lines 809 - 820, graphmeta_rdb_save currently writes nothing which makes a stale graphmeta key unreadable on restore; modify graphmeta_rdb_save to write a small explicit sentinel token (e.g. a single known byte/tag or string constant like SENTINEL_GRAPHMETA) into the RDB so a loader can detect an intentionally-empty graphmeta; then update graphmeta_rdb_load to read and check that sentinel before calling rdb_load_graph and only call rdb_load_graph when the sentinel is not present (or when a different payload marker indicates a real graph), ensuring the save/load pair are symmetric and the loader avoids attempting to decode a non-existent graph.
457-473:⚠️ Potential issue | 🟠 MajorUse safe UTF-8 conversion for binary-safe SCAN reply data.
Redis keys are binary-safe and can contain non-UTF-8 bytes. Using
from_utf8_unchecked()on key names from SCAN replies is undefined behavior if the key contains invalid UTF-8. The same issue exists at lines 622-625 and 635-639.🛡️ Proposed fix to use safe conversion
- let new_cursor = std::str::from_utf8_unchecked(std::slice::from_raw_parts( - cursor_ptr.cast(), - cursor_len, - )); + let new_cursor = String::from_utf8_lossy(std::slice::from_raw_parts( + cursor_ptr.cast(), + cursor_len, + )); let done = new_cursor == "0";- let key_name = - std::str::from_utf8_unchecked(std::slice::from_raw_parts(kptr.cast(), key_len)) - .to_string(); + let key_name = String::from_utf8_lossy(std::slice::from_raw_parts(kptr.cast(), key_len)) + .to_string();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/redis_type.rs` around lines 457 - 473, The code unsafely calls std::str::from_utf8_unchecked on SCAN reply data (new_cursor and key_name) which may contain non-UTF-8 bytes; replace these unchecked conversions with safe handling: call std::str::from_utf8(...) and handle the Result, using String::from_utf8_lossy(cursor_bytes).to_string() or otherwise preserve the raw bytes (Vec<u8>) when UTF-8 is invalid; update the places using from_utf8_unchecked (e.g., where new_cursor is computed and where key_name is built via RedisModule_CallReplyStringPtr) to use from_utf8 with fallback to from_utf8_lossy or byte-preserving logic so the SCAN reply handling is binary-safe.
133-135:⚠️ Potential issue | 🟡 MinorSilent return may cause data corruption on restore.
If
get_graph_refreturnsNonewhen a virtual key payload exists, the RDB save writes nothing for this key. On restore, Redis will encounter an empty/malformed key. Add error logging to aid debugging state inconsistencies.🛡️ Proposed fix to add error logging
let Some(graph_arc) = vkey_state.get_graph_ref(&graph_name).cloned() else { + eprintln!( + "graph rdb_save: missing graph ref for virtual key '{key_name}' (graph '{graph_name}')" + ); return; };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/redis_type.rs` around lines 133 - 135, Replace the silent early return in the block that does let Some(graph_arc) = vkey_state.get_graph_ref(&graph_name).cloned() else { return; } with an error log that records the inconsistent state before returning; specifically, call your logger (e.g., error! / processLogger.error) to emit a message containing graph_name and an identifier for the virtual key state (vkey_state) and note that get_graph_ref returned None while a virtual key payload exists, then return. This ensures failures in redis_type.rs (when resolving get_graph_ref) are visible for debugging and audit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/redis_type.rs`:
- Around line 809-820: graphmeta_rdb_save currently writes nothing which makes a
stale graphmeta key unreadable on restore; modify graphmeta_rdb_save to write a
small explicit sentinel token (e.g. a single known byte/tag or string constant
like SENTINEL_GRAPHMETA) into the RDB so a loader can detect an
intentionally-empty graphmeta; then update graphmeta_rdb_load to read and check
that sentinel before calling rdb_load_graph and only call rdb_load_graph when
the sentinel is not present (or when a different payload marker indicates a real
graph), ensuring the save/load pair are symmetric and the loader avoids
attempting to decode a non-existent graph.
- Around line 457-473: The code unsafely calls std::str::from_utf8_unchecked on
SCAN reply data (new_cursor and key_name) which may contain non-UTF-8 bytes;
replace these unchecked conversions with safe handling: call
std::str::from_utf8(...) and handle the Result, using
String::from_utf8_lossy(cursor_bytes).to_string() or otherwise preserve the raw
bytes (Vec<u8>) when UTF-8 is invalid; update the places using
from_utf8_unchecked (e.g., where new_cursor is computed and where key_name is
built via RedisModule_CallReplyStringPtr) to use from_utf8 with fallback to
from_utf8_lossy or byte-preserving logic so the SCAN reply handling is
binary-safe.
- Around line 133-135: Replace the silent early return in the block that does
let Some(graph_arc) = vkey_state.get_graph_ref(&graph_name).cloned() else {
return; } with an error log that records the inconsistent state before
returning; specifically, call your logger (e.g., error! / processLogger.error)
to emit a message containing graph_name and an identifier for the virtual key
state (vkey_state) and note that get_graph_ref returned None while a virtual key
payload exists, then return. This ensures failures in redis_type.rs (when
resolving get_graph_ref) are visible for debugging and audit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 41976983-b447-4486-a5fe-d362d8cfb1b3
📒 Files selected for processing (6)
graph/src/graph/attribute_store.rsgraph/src/graph/graph.rssrc/redis_type.rssrc/serializers/encoder/mod.rssrc/serializers/mod.rstests/flow/redis.conf
✅ Files skipped from review due to trivial changes (1)
- tests/flow/redis.conf
🚧 Files skipped from review as they are similar to previous changes (1)
- graph/src/graph/attribute_store.rs
Two fixes for CI: 1. Use a static EMPTY_ATTRS Arc instead of allocating a new Arc<Vec::new()> per entity in populate_cache_from_fjall. For graphs with 1.5M property-less entities, this eliminates millions of heap allocations during RDB encoding in the fork child. 2. Fix subprocess.Popen in tests/common.py: when executable= is set, args[0] becomes the process name (argv[0]), so --save "" was never passed to Redis. Use the list form [prog, args...] instead, which correctly passes --save "" to disable auto-save in benchmarks.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/common.py (2)
37-39: Consider making the redis-server path configurable.The redis-server path is hardcoded to
/usr/local/bin/redis-server, which may not be portable across different systems. On some Linux distributions, redis-server is located at/usr/bin/redis-server, and on other systems it may be elsewhere. This could cause test failures in different environments.🔧 Proposed fix to make redis-server path configurable
def start_redis(release=None, moduleEnvs=[]): global redis_server, client, g, shutdown port = os.environ.get("PORT", "6379") + redis_bin = os.environ.get("REDIS_SERVER", "/usr/local/bin/redis-server") if release is None: release = True if os.environ.get("RELEASE", "").lower() == "1" else False default_target = "target/debug/libfalkordb.so"redis_server = subprocess.Popen( - ["/usr/local/bin/redis-server", + [redis_bin, "--save", "", "--port", port, "--logfile", "redis-test.log", "--loadmodule", target] + moduleEnvs, stdout=subprocess.PIPE)Alternatively, use
shutil.which("redis-server")to automatically locate the binary on the system PATH.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/common.py` around lines 37 - 39, The hardcoded redis binary path ("/usr/local/bin/redis-server") in the argument list should be made configurable: replace the literal with a resolved path using os.environ.get("REDIS_SERVER") or shutil.which("redis-server") (with a sensible fallback and a clear error if not found) where the process args are built (the list containing "--save", "--port", port, "--logfile", "redis-test.log", "--loadmodule", target] + moduleEnvs). Ensure the resolved path is used in that list and add a brief diagnostic message if the binary cannot be located.
36-40: The--save ""argument passing is now correct.The change from using
executable=keyword arguments to a single positional argument list properly passes"--save"and""as separate arguments to redis-server, which fixes the issue described in the PR objectives.Optional: Consider iterable unpacking for cleaner concatenation.
As suggested by the static analyzer (RUF005), you could use iterable unpacking instead of list concatenation:
✨ Optional style improvement
redis_server = subprocess.Popen( - ["/usr/local/bin/redis-server", - "--save", "", "--port", port, "--logfile", "redis-test.log", - "--loadmodule", target] + moduleEnvs, + ["/usr/local/bin/redis-server", + "--save", "", "--port", port, "--logfile", "redis-test.log", + "--loadmodule", target, *moduleEnvs], stdout=subprocess.PIPE)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/common.py` around lines 36 - 40, The subprocess.Popen call constructing redis_server currently concatenates lists with "+", e.g. the argument list built from the inline list and moduleEnvs; refactor it to use iterable unpacking so the args are built more cleanly and avoid explicit list concatenation—update the redis_server = subprocess.Popen(...) invocation (the argument list containing "/usr/local/bin/redis-server", "--save", "", "--port", port, "--logfile", "redis-test.log", "--loadmodule", target and moduleEnvs) to use unpacking (e.g. using *moduleEnvs or a single unpacked list) when passing to subprocess.Popen while keeping stdout=subprocess.PIPE unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@graph/src/graph/attribute_store.rs`:
- Around line 322-356: The loop uses self.snapshot() which can be stale after
flush_dirty_to_fjall(); refresh the snapshot used for RDB building so flushed
attrs aren’t missed. Replace/augment the scan to take a fresh snapshot from
get_database().snapshot() (or update self.snapshot after flush_dirty_to_fjall())
before iterating in the block that currently calls
self.snapshot().iter(self.keyspace()), ensuring encode_with_range() sees the
up-to-date fjall state; update any callers/fields so the new snapshot is used
for the rest of the RDB build pass.
---
Nitpick comments:
In `@tests/common.py`:
- Around line 37-39: The hardcoded redis binary path
("/usr/local/bin/redis-server") in the argument list should be made
configurable: replace the literal with a resolved path using
os.environ.get("REDIS_SERVER") or shutil.which("redis-server") (with a sensible
fallback and a clear error if not found) where the process args are built (the
list containing "--save", "--port", port, "--logfile", "redis-test.log",
"--loadmodule", target] + moduleEnvs). Ensure the resolved path is used in that
list and add a brief diagnostic message if the binary cannot be located.
- Around line 36-40: The subprocess.Popen call constructing redis_server
currently concatenates lists with "+", e.g. the argument list built from the
inline list and moduleEnvs; refactor it to use iterable unpacking so the args
are built more cleanly and avoid explicit list concatenation—update the
redis_server = subprocess.Popen(...) invocation (the argument list containing
"/usr/local/bin/redis-server", "--save", "", "--port", port, "--logfile",
"redis-test.log", "--loadmodule", target and moduleEnvs) to use unpacking (e.g.
using *moduleEnvs or a single unpacked list) when passing to subprocess.Popen
while keeping stdout=subprocess.PIPE unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 01f76576-8748-477b-9c02-3b65054a4224
📒 Files selected for processing (2)
graph/src/graph/attribute_store.rstests/common.py
6d3784e to
5353d4b
Compare
…-saves save "" prevents auto-triggered BGSAVE (which races with explicit bgsave in tests), but also prevents the shutdown save that RLTest's restart_and_reload() relies on. shutdown-on-sigterm save force (Redis 8+) forces a synchronous RDB save on SIGTERM regardless of save rules, so data survives the bgrewriteaof + stop + start cycle.
5353d4b to
1ef41be
Compare
|
@copilot resolve the merge conflicts in this pull request |
# Conflicts: # flow_tests_todo.txt # src/lib.rs Co-authored-by: AviAvni <1097246+AviAvni@users.noreply.github.com>
Resolved the merge conflicts in commit
|
|
@copilot resolve the merge conflicts in this pull request |
# Conflicts: # graph/src/runtime/pending.rs Co-authored-by: AviAvni <1097246+AviAvni@users.noreply.github.com>
Resolved the merge conflict in commit |
…y.py Agent-Logs-Url: https://github.com/FalkorDB/falkordb-rs-next-gen/sessions/bb1e2271-bf3b-4ea3-825e-0f6629d34a50 Co-authored-by: AviAvni <1097246+AviAvni@users.noreply.github.com>
Fixed in commit
Also increased the timeout in |
|
@copilot ci still fail run the test using the devcontainer in an environment same as ci try to reproduce debug and fix the issue |
…n test_persistency.py Agent-Logs-Url: https://github.com/FalkorDB/falkordb-rs-next-gen/sessions/2db670cf-47d3-4f66-8aa8-4591a3bc3e71 Co-authored-by: AviAvni <1097246+AviAvni@users.noreply.github.com>
Fixed in commit The 60-second timeout increase wasn't enough — BGSAVE for 1000 graphs (1M nodes + 500K edges) was still taking longer than 60 seconds in CI. And even when I added an early Two changes:
Same drain pattern applied to |
…scan + HashMap Agent-Logs-Url: https://github.com/FalkorDB/falkordb-rs-next-gen/sessions/ebf11758-3f5c-43ca-9cfb-ef56bdf0034b Co-authored-by: AviAvni <1097246+AviAvni@users.noreply.github.com>
|
@copilot resolve the merge conflicts in this pull request |
Co-authored-by: AviAvni <1097246+AviAvni@users.noreply.github.com>
Resolved in commit
|
…age and graph operations
Summary
GRAPHMETA_TYPEfor C FalkorDB compatibility,on_persistenceevent handling,finalize_pending_graphsGRAPHMETA_TYPE,GRAPH.EFFECT,GRAPH.DEBUGcommands, wireserializersmoduleRedisModuleEvent_Persistencefor virtual key lifecycleGRAPH.DEBUG AUX START/ENDfor testing RDB load boundariesGRAPH.QUERYto detect stale clientsTests
test_rdb_compat.py(new): C-to-Rust and Rust-to-C replication tests via Dockertest_rdb_load.py: Rewritten to use live data + DEBUG AUX instead of hardcoded dumpstest_persistency.py: Updated expectations for new behaviortest_prev_rdb_decode.pyand old dump filesThis is the final PR in the #359 split — wires together all previously merged components (serializers, effects buffer, graph encode/decode, effects replication).
Test plan
cargo check/cargo build --releasepassesTEST=tests/flow/test_rdb_load ./flow.shTEST=tests/flow/test_persistency ./flow.shTEST=tests/flow/test_replication ./flow.shTEST=tests/flow/test_rdb_compat ./flow.sh(requires Docker)Summary by CodeRabbit
New Features
Improvements
Tests