Skip to content

[compiler-v2] Implement a new abstraction for derived call graph query caching: fixes many subtle bugs.#19486

Merged
vineethk merged 2 commits intomainfrom
vk/derived-cached-call-graph
Apr 21, 2026
Merged

[compiler-v2] Implement a new abstraction for derived call graph query caching: fixes many subtle bugs.#19486
vineethk merged 2 commits intomainfrom
vk/derived-cached-call-graph

Conversation

@vineethk
Copy link
Copy Markdown
Contributor

@vineethk vineethk commented Apr 17, 2026

Description

The latent bug: seven cross-function caches lived as separate RefCell<Option<BTreeSet<_>>> fields on each FunctionData. set_function_def only invalidated them on the edited function, but those caches depend on other functions' used_funs / called_funs — so entries on unrelated functions silently went stale. Three invalidation sites (set_function_def, filter_functions, and new cache fields in accessors) had to each remember to clear every relevant field. One already missed used_structs. add_function_def, add_function_def_from_data did not even do any invalidation, which was wrong.

The fix: pull all seven caches off FunctionData into a single CallGraphCache on GlobalEnv, with one invalidate() method. Every call-graph mutation (set_function_def, add_function_def, add_function_def_from_data, retain_functions) calls it. Easy to extend, centralized location to invalidate, no per-cache book-keeping.

How Has This Been Tested?

Existing tests.

Type of Change

  • Bug fix

Which Components or Systems Does This Change Impact?

  • Move Compiler

Note

Medium Risk
Touches core Move model call-graph query/memoization and invalidation paths; mistakes could cause incorrect compiler analyses or missed references, though the change is mostly an internal refactor with centralized invalidation.

Overview
Refactors derived call-graph query caching to live in a new GlobalEnv::call_graph_cache (CallGraphCache) instead of per-FunctionData RefCell<Option<…>> fields, eliminating stale cross-function cache bugs.

All call-graph mutation points now invalidate the centralized cache (add module, attach_compiled_module, set_function_def, add_function_def, add_function_def_from_data, and the new retain_functions which replaces filter_functions and also prunes removed ids from used_funs/called_funs). Call-graph query accessors (get_using_functions, transitive closure helpers, and inline-expanded variants) are updated to memoize through CallGraphCache.

Compiler v2 inliner is updated to call env.retain_functions when dropping inline functions with bodies.

Reviewed by Cursor Bugbot for commit 7e2d1e7. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor Author

vineethk commented Apr 17, 2026

@vineethk vineethk changed the title Implement a new abstraction for derived call graph query caching: fixes many subtle bugs. [compiler-v2] Implement a new abstraction for derived call graph query caching: fixes many subtle bugs. Apr 17, 2026
@vineethk vineethk marked this pull request as ready for review April 17, 2026 16:20
@vineethk vineethk requested a review from Copilot April 17, 2026 16:30
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differential Security Review — PR #19486

Scope: 6badec40..fa81873f — compiler-v2 call-graph query cache refactor
Reviewer: Automated differential review

Executive Summary

Severity Count
CRITICAL 0
HIGH 0
MEDIUM 1
LOW 2

Overall risk: Low–Medium. The architectural change is sound and correctly fixes the stale-cache bugs described in the PR. Two latent invalidation gaps of the exact same class remain: attach_compiled_module and GlobalEnv::add both mutate used_funs/called_funs without calling invalidate(). Neither has a current exploit path (both occur before query phases in today's pipeline), but they are future maintenance traps.

Recommendation: REVIEW BEFORE MERGE


What Changed

Files changed: 3 (3 Rust, 0 Move) | Lines: +224 / -203

Module Files Changed Risk Level
third_party/move/move-model/src/model.rs 1 Medium
third_party/move/move-model/src/builder/module_builder.rs 1 Low
third_party/move/move-compiler-v2/src/env_pipeline/inliner.rs 1 Low

Core change: 7 per-FunctionData RefCell<Option<BTreeSet<_>>> cache fields are pulled off each function's struct and consolidated into a single CallGraphCache on GlobalEnv. A single invalidate() method (which resets the whole struct via *self = Self::default()) replaces all per-field null-outs. Four mutation sites (set_function_def, add_function_def, add_function_def_from_data, retain_functions) now call invalidate(). filter_functions is renamed to retain_functions and is updated to prune both used_funs and called_funs (the old code only pruned used_funs and the now-removed derived using_funs).


Findings

[MEDIUM] attach_compiled_module writes used_funs/called_funs without invalidating the cache

File: third_party/move/move-model/src/model.rs:1843–1848
Blast radius: Any call-graph query (get_calling_functions, get_using_functions, transitive closures, inline-expanded variants) made after attach_compiled_module returns stale results if the cache was populated before the call.
Test coverage: Untested — no test exercises call-graph queries before and after attach_compiled_module on the same env.

Description: attach_compiled_module is a public &mut self method that directly writes to fun_data.used_funs and fun_data.called_funs (lines 1843–1848). This is exactly the category of mutation the PR centralizes under invalidate(). The method ends without touching self.call_graph_cache.

The four sites the PR does protect are:

  • set_function_definvalidate()
  • add_function_definvalidate()
  • add_function_def_from_datainvalidate()
  • retain_functionsinvalidate()

attach_compiled_module is absent from this list even though it performs the same writes.

Concrete impact: In the current compiler pipeline, attach_compiled_module is called by the file format generator after all env-pipeline passes complete. Call-graph queries are made during those passes. As long as no pass runs after attach_compiled_module queries the cache, results are correct today. However:

  1. This is an implicit ordering contract that is not enforced by the type system or any assertion.
  2. Any future pipeline stage (prover, linter, docgen) inserted after file-format generation that queries the call-graph would silently read stale results — the same class of subtle bug the PR is fixing.
  3. The binary module loader path (binary_module_loader.rs:94) also calls attach_compiled_module during initial model construction; here the cache is empty so there is no stale read, but the gap is still a maintenance hazard.

Historical context: The PR description explicitly names add_function_def and add_function_def_from_data as sites that "did not even do any invalidation, which was wrong." attach_compiled_module belongs in the same list.


[LOW] GlobalEnv::add inserts functions with call edges without invalidating the cache

File: third_party/move/move-model/src/model.rs:1645–1721
Blast radius: Low — add() is called during initial model construction only.
Test coverage: Untested for post-construction call-graph consistency.

Description: GlobalEnv::add appends a complete new module (with pre-built function_data that can include non-None used_funs/called_funs) to self.module_data. New functions introduce new edges into the call graph. If any cache entry was populated before this call, inverse queries (get_calling_functions, get_using_functions) for previously-seen functions would miss the newly-added callers/users.

add_function_def (which adds a single function to an existing module) correctly calls invalidate(). The bulk-add path via add() does not.

Concrete impact: In practice, all add() calls happen during model construction before any query. No current exploit path exists. This is a latent pattern gap consistent with the two sites the PR explicitly fixed (add_function_def_from_data is the closest analog — it also adds a complete FunctionData to an existing module and does call invalidate()).


[LOW] No targeted tests for cache invalidation correctness

Test coverage: The PR description states "Existing tests." No new tests verify that call-graph queries return correct results across the four (now five, counting attach_compiled_module) mutation sites.

Concrete impact: The original bugs (stale caches per the PR description) were present for some time without being caught by tests. Without regression tests that explicitly:

  1. Populate the cache via a query,
  2. Call a mutation site,
  3. Re-query and assert correctness,

a future regression (e.g., a new mutation site added without invalidate()) may again go undetected.

This is a test-gap observation, not a code bug.


Blast Radius

Changed Function Non-Test Callers Classification
CallGraphCache::invalidate (new) 4 LOW — called only from mutation sites
retain_functions (renamed from filter_functions) 1 (inliner.rs) LOW — single caller, updated
FunctionData struct (7 fields removed) All constructors LOW — all sites covered in diff
get_calling_functions / get_using_functions / transitive closures 11 call sites across linter, prover, docgen, compiler MEDIUM — semantics preserved

Highest-risk dependency chain: inliner.rs::run_inliningretain_functionscall_graph_cache.invalidate(). This path correctly invalidates after removing inline functions. The rename from filter_functions to retain_functions is a pure API rename; the single caller is updated.


Positive Observations

  • Struct reset approach for invalidate() (*self = Self::default()) is the right pattern: any future field added to CallGraphCache is automatically invalidated without touching invalidate().
  • cached / cached_opt split is semantically correct. cached_opt does not store None results (when used_funs data is absent for some function), so failed-computation functions are re-attempted on next access rather than being incorrectly pinned as "empty".
  • No double-borrow panics: The RefCell::borrow() guard is released before borrow_mut() is taken in both cached and cached_opt. Recursive calls from within a closure (e.g., get_using_functions called inside get_using_functions_with_transitive_inline's closure) touch different RefCell fields, so there is no runtime-borrow-violation risk.
  • retain_functions now correctly prunes called_funs: The old filter_functions only pruned used_funs and the derived using_funs. Pruning called_funs on surviving functions is a functional bug fix that matches the stated intent.
Open in Web View Automation 

Sent by Cursor Automation: Security Review Bot

Comment thread third_party/move/move-model/src/model.rs
Comment thread third_party/move/move-model/src/model.rs
Comment thread third_party/move/move-model/src/model.rs
Comment thread third_party/move/move-model/src/model.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes stale call-graph query results in the Move model by centralizing cross-function derived call-graph memoization in a single GlobalEnv cache with centralized invalidation, instead of scattered per-FunctionData caches that were easy to miss during mutations.

Changes:

  • Introduces GlobalEnv::call_graph_cache: CallGraphCache to hold all derived call-graph caches (inverse edges, transitive closures, inline-expanded variants) and adds a single invalidate() path.
  • Updates call-graph mutation APIs (set_function_def, add_function_def, add_function_def_from_data, and filter_functionsretain_functions) to invalidate the centralized cache and prune edges on retention.
  • Updates call-graph query accessors to read/write through the new centralized cache, and updates compiler-v2 inliner to call retain_functions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
third_party/move/move-model/src/model.rs Adds CallGraphCache to GlobalEnv, removes per-function caches, rewires call-graph query memoization, and updates invalidation/retention logic.
third_party/move/move-model/src/builder/module_builder.rs Updates FunctionData construction to match the removal of per-function cache fields.
third_party/move/move-compiler-v2/src/env_pipeline/inliner.rs Switches from filter_functions to retain_functions when removing inline functions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread third_party/move/move-model/src/model.rs
Comment thread third_party/move/move-model/src/model.rs
Copy link
Copy Markdown
Contributor

@wrwg wrwg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@vineethk vineethk force-pushed the vk/derived-cached-call-graph branch from 00fa968 to 61b3e74 Compare April 20, 2026 21:23
@vineethk vineethk force-pushed the vk/fix-needless-visible-lint branch from b5ef59b to e09d9c2 Compare April 20, 2026 21:23
Base automatically changed from vk/fix-needless-visible-lint to main April 20, 2026 22:05
@vineethk vineethk force-pushed the vk/derived-cached-call-graph branch from 61b3e74 to 546650b Compare April 21, 2026 00:26
@vineethk vineethk enabled auto-merge (squash) April 21, 2026 00:26
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@vineethk vineethk force-pushed the vk/derived-cached-call-graph branch from 546650b to 98ec2bd Compare April 21, 2026 01:00
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@vineethk vineethk force-pushed the vk/derived-cached-call-graph branch from 98ec2bd to 7e2d1e7 Compare April 21, 2026 14:27
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Forge suite realistic_env_max_load success on 7e2d1e78f4120615300b0e2732b36d881dd05cf5

two traffics test: inner traffic : committed: 13846.35 txn/s, latency: 1300.50 ms, (p50: 1200 ms, p70: 1300, p90: 1500 ms, p99: 1900 ms), latency samples: 5171640
two traffics test : committed: 99.99 txn/s, latency: 647.51 ms, (p50: 600 ms, p70: 700, p90: 800 ms, p99: 1000 ms), latency samples: 1720
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.483, avg: 0.457", "ConsensusProposalToOrdered: max: 0.127, avg: 0.120", "ConsensusOrderedToCommit: max: 0.183, avg: 0.155", "ConsensusProposalToCommit: max: 0.302, avg: 0.275"]
Max non-epoch-change gap was: 1 rounds at version 17048 (avg 0.00) [limit 4], 1.11s no progress at version 17048 (avg 0.06s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.34s no progress at version 2586634 (avg 0.34s) [limit 16].
Test Ok

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Forge suite framework_upgrade success on ca049383dd80675149ef2d0042668964f9f9107a ==> 7e2d1e78f4120615300b0e2732b36d881dd05cf5

Compatibility test results for ca049383dd80675149ef2d0042668964f9f9107a ==> 7e2d1e78f4120615300b0e2732b36d881dd05cf5 (PR)
Upgrade the nodes to version: 7e2d1e78f4120615300b0e2732b36d881dd05cf5
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2356.31 txn/s, submitted: 2363.40 txn/s, failed submission: 7.09 txn/s, expired: 7.09 txn/s, latency: 1227.33 ms, (p50: 1200 ms, p70: 1200, p90: 1700 ms, p99: 2900 ms), latency samples: 212721
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1843.52 txn/s, submitted: 1849.22 txn/s, failed submission: 5.70 txn/s, expired: 5.70 txn/s, latency: 1680.15 ms, (p50: 1200 ms, p70: 1500, p90: 2300 ms, p99: 12700 ms), latency samples: 168040
5. check swarm health
Compatibility test for ca049383dd80675149ef2d0042668964f9f9107a ==> 7e2d1e78f4120615300b0e2732b36d881dd05cf5 passed
Upgrade the remaining nodes to version: 7e2d1e78f4120615300b0e2732b36d881dd05cf5
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1694.11 txn/s, submitted: 1700.98 txn/s, failed submission: 6.87 txn/s, expired: 6.87 txn/s, latency: 1832.95 ms, (p50: 1200 ms, p70: 1700, p90: 3000 ms, p99: 11500 ms), latency samples: 152942
Test Ok

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Forge suite compat success on ca049383dd80675149ef2d0042668964f9f9107a ==> 7e2d1e78f4120615300b0e2732b36d881dd05cf5

Compatibility test results for ca049383dd80675149ef2d0042668964f9f9107a ==> 7e2d1e78f4120615300b0e2732b36d881dd05cf5 (PR)
1. Check liveness of validators at old version: ca049383dd80675149ef2d0042668964f9f9107a
compatibility::simple-validator-upgrade::liveness-check : committed: 14367.74 txn/s, latency: 2415.29 ms, (p50: 2400 ms, p70: 2700, p90: 3300 ms, p99: 4200 ms), latency samples: 475100
2. Upgrading first Validator to new version: 7e2d1e78f4120615300b0e2732b36d881dd05cf5
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6066.79 txn/s, latency: 5518.26 ms, (p50: 6100 ms, p70: 6200, p90: 6300 ms, p99: 6300 ms), latency samples: 213680
3. Upgrading rest of first batch to new version: 7e2d1e78f4120615300b0e2732b36d881dd05cf5
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6404.61 txn/s, latency: 5321.03 ms, (p50: 5800 ms, p70: 6000, p90: 6100 ms, p99: 6200 ms), latency samples: 223000
4. upgrading second batch to new version: 7e2d1e78f4120615300b0e2732b36d881dd05cf5
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11039.43 txn/s, latency: 2940.74 ms, (p50: 3100 ms, p70: 3200, p90: 3400 ms, p99: 3500 ms), latency samples: 359800
5. check swarm health
Compatibility test for ca049383dd80675149ef2d0042668964f9f9107a ==> 7e2d1e78f4120615300b0e2732b36d881dd05cf5 passed
Test Ok

@vineethk vineethk merged commit d601270 into main Apr 21, 2026
75 of 76 checks passed
@vineethk vineethk deleted the vk/derived-cached-call-graph branch April 21, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants