[mono-vm] Factor specializer pipeline out of the executable and into the specialize crate.#19454
[mono-vm] Factor specializer pipeline out of the executable and into the specialize crate.#19454
Conversation
There was a problem hiding this comment.
Pull request overview
This PR factors the specializer pipeline (destack → lower → gas instrumentation → frame layout computation) out of global-context’s executable builder and into the specializer crate, while also removing unnecessary CompiledModule cloning by making the IR borrow the module.
Changes:
- Introduces
specializer::destack_and_lower_module()and newLoweredFunction/LoweredModuletypes to encapsulate the full lowering pipeline. - Updates stackless exec IR (
ModuleIR) anddestack()to borrow&CompiledModuleinstead of owning it (eliminating clones across callers). - Adjusts callers (tests, CLI, executable builder) and Cargo dependencies accordingly.
Reviewed changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| third_party/move/mono-move/specializer/tests/testsuite.rs | Updates tests to use borrowed ModuleIR and destack(&module, ...). |
| third_party/move/mono-move/specializer/src/stackless_exec_ir/mod.rs | Makes ModuleIR borrow &CompiledModule via a lifetime parameter. |
| third_party/move/mono-move/specializer/src/stackless_exec_ir/display.rs | Updates Display impl to match ModuleIR<'_>. |
| third_party/move/mono-move/specializer/src/pipeline.rs | Adds new high-level pipeline destack_and_lower_module() producing lowered+instrumented functions and frame sizes. |
| third_party/move/mono-move/specializer/src/lower/mod.rs | Adds public LoweredFunction/LoweredModule result types. |
| third_party/move/mono-move/specializer/src/lib.rs | Exposes pipeline module and re-exports new API/types. |
| third_party/move/mono-move/specializer/src/destack/translate.rs | Updates translation to accept &CompiledModule and return ModuleIR<'m>. |
| third_party/move/mono-move/specializer/src/destack/optimize.rs | Updates optimizer signature for ModuleIR<'_>. |
| third_party/move/mono-move/specializer/src/destack/mod.rs | Updates destack() to accept &CompiledModule and return ModuleIR<'m>. |
| third_party/move/mono-move/specializer/src/bin/mseir-compiler.rs | Updates CLI to call destack(&module, ...) and use ModuleIR<'_>. |
| third_party/move/mono-move/specializer/Cargo.toml | Adds mono-move-gas dependency for pipeline instrumentation. |
| third_party/move/mono-move/global-context/src/context/executable.rs | Replaces inlined pipeline with specializer::destack_and_lower_module(). |
| third_party/move/mono-move/global-context/Cargo.toml | Removes unused deps; moves mono-move-gas to dev-dependencies for tests. |
| Cargo.lock | Lockfile updates for dependency graph changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let module_ir = destack(module, &struct_name_table)?; | ||
| let func_id_map = build_func_id_map(module_ir.module); | ||
|
|
||
| let mut functions = Vec::with_capacity(module_ir.functions.len()); | ||
| for func_ir in &module_ir.functions { | ||
| let lowered = | ||
| match try_build_context(module_ir.module, func_ir, &func_id_map)? { | ||
| Some(ctx) => { | ||
| let micro_ops = lower_function(func_ir, &ctx)?; | ||
| let code = GasInstrumentor::new(MicroOpGasSchedule).run(micro_ops); | ||
|
|
||
| let args_size = ctx.home_slots[..func_ir.num_params as usize] | ||
| .iter() | ||
| .map(|s| s.size as usize) | ||
| .sum::<usize>(); | ||
| let args_and_locals_size = ctx.frame_data_size as usize; | ||
| let extended_frame_size = ctx | ||
| .call_sites | ||
| .iter() | ||
| .flat_map(|cs| cs.arg_write_slots.iter().chain(cs.ret_read_slots.iter())) | ||
| .map(|s| (s.offset + s.size) as usize) | ||
| .max() | ||
| // Leaf function: no callee slots needed beyond metadata. | ||
| .unwrap_or(args_and_locals_size + FRAME_METADATA_SIZE); | ||
|
|
||
| Some(LoweredFunction { | ||
| name_idx: func_ir.name_idx, | ||
| code, | ||
| args_size, | ||
| args_and_locals_size, | ||
| extended_frame_size, | ||
| }) | ||
| }, | ||
| None => None, | ||
| }; | ||
| functions.push(lowered); | ||
| } |
There was a problem hiding this comment.
LoweredModule.functions is treated as being indexed by Move FunctionDefinition index (it’s later passed into Function::resolve_calls, where func_id is a definition index). However, the destack IR only contains entries for non-native definitions (fdef.code.is_some()), so this vector will be shorter / misaligned if the module contains native function defs, leading to incorrect patching or out-of-bounds indexing. Consider constructing LoweredModule.functions with length module.function_defs.len() and placing each lowered function at its real definition index (e.g., via func_id_map[func_ir.handle_idx.0]), or explicitly rejecting modules with native function defs here if they’re unsupported.
| /// Run the full specializer pipeline: destack → lower → gas instrument → frame layout. | ||
| /// | ||
| /// Takes a reference to the compiled module — no cloning required. | ||
| pub fn destack_and_lower_module(module: &CompiledModule) -> Result<LoweredModule> { | ||
| // Identity mapping: valid when loading a single module in isolation. | ||
| let struct_name_table: Vec<StructNameIndex> = (0..module.struct_handles.len()) | ||
| .map(|i| StructNameIndex::new(i as u32)) | ||
| .collect(); | ||
| let module_ir = destack(module, &struct_name_table)?; | ||
| let func_id_map = build_func_id_map(module_ir.module); | ||
|
|
||
| let mut functions = Vec::with_capacity(module_ir.functions.len()); | ||
| for func_ir in &module_ir.functions { | ||
| let lowered = | ||
| match try_build_context(module_ir.module, func_ir, &func_id_map)? { | ||
| Some(ctx) => { | ||
| let micro_ops = lower_function(func_ir, &ctx)?; | ||
| let code = GasInstrumentor::new(MicroOpGasSchedule).run(micro_ops); | ||
|
|
||
| let args_size = ctx.home_slots[..func_ir.num_params as usize] | ||
| .iter() | ||
| .map(|s| s.size as usize) | ||
| .sum::<usize>(); | ||
| let args_and_locals_size = ctx.frame_data_size as usize; | ||
| let extended_frame_size = ctx | ||
| .call_sites | ||
| .iter() | ||
| .flat_map(|cs| cs.arg_write_slots.iter().chain(cs.ret_read_slots.iter())) | ||
| .map(|s| (s.offset + s.size) as usize) | ||
| .max() | ||
| // Leaf function: no callee slots needed beyond metadata. | ||
| .unwrap_or(args_and_locals_size + FRAME_METADATA_SIZE); | ||
|
|
||
| Some(LoweredFunction { | ||
| name_idx: func_ir.name_idx, | ||
| code, | ||
| args_size, | ||
| args_and_locals_size, | ||
| extended_frame_size, | ||
| }) |
There was a problem hiding this comment.
New pipeline orchestration (lowering + gas instrumentation + frame size computation) is introduced here, but there’s no direct test exercising destack_and_lower_module end-to-end. Adding a focused test that asserts the returned LoweredModule has expected args_size/extended_frame_size invariants (and that gas instrumentation runs) would help catch regressions in this high-level entry point.
| // Indexed by definition index. Generic functions that are not | ||
| // lowered leave their slot as None. | ||
| let mut func_ptrs = vec![None; module_ir.functions.len()]; | ||
| for (def_idx, func_ir) in module_ir.functions.iter().enumerate() { | ||
| let name = module_ir.module.identifier_at(func_ir.name_idx); | ||
| let name = self.guard.intern_identifier_internal(name); | ||
|
|
||
| // TODO: support generic functions. | ||
| if let Some(ctx) = | ||
| specializer::lower::try_build_context(&module_ir.module, func_ir, &func_id_map)? | ||
| { | ||
| let micro_ops = specializer::lower::lower_function(func_ir, &ctx)?; | ||
| let micro_ops = GasInstrumentor::new(MicroOpGasSchedule).run(micro_ops); | ||
|
|
||
| // Compute frame layout. | ||
| let args_size = ctx.home_slots[..func_ir.num_params as usize] | ||
| .iter() | ||
| .map(|s| s.size as usize) | ||
| .sum::<usize>(); | ||
| let args_and_locals_size = ctx.frame_data_size as usize; | ||
| let extended_frame_size = ctx | ||
| .call_sites | ||
| .iter() | ||
| .flat_map(|cs| cs.arg_write_slots.iter().chain(cs.ret_read_slots.iter())) | ||
| .map(|s| (s.offset + s.size) as usize) | ||
| .max() | ||
| .unwrap_or(args_and_locals_size + FRAME_METADATA_SIZE); | ||
|
|
||
| // Allocate micro-ops and frame layout in the executable arena. | ||
| let code = self.arena.alloc_slice_fill_iter(micro_ops); | ||
| let mut func_ptrs = vec![None; lowered.functions.len()]; | ||
| for (def_idx, lowered_fn) in lowered.functions.into_iter().enumerate() { | ||
| if let Some(lf) = lowered_fn { |
There was a problem hiding this comment.
The code/comment assumes func_ptrs is indexed by Move FunctionDefinition index, but its length comes from lowered.functions.len(). If LoweredModule.functions is not exactly aligned to definition indices (e.g., if the module contains native function defs that were skipped during destack/lowering), Function::resolve_calls can index the wrong slot or panic. Ensure LoweredModule.functions is definition-indexed (same length as module.function_defs) or add validation that this invariant holds before calling resolve_calls.
There was a problem hiding this comment.
Differential Security Review — PR #19454: Factor specializer pipeline into the specialize crate
Scope: f1e65d03a5..f2f9929072 · 1 commit · 14 files · +126 / -65
Codebase: third_party/move/mono-move/ — experimental mono-VM subsystem (not yet in consensus or execution hot paths)
Strategy: Full trace (< 15 files, mixed risk)
Executive Summary
| Severity | Count |
|---|---|
| CRITICAL | 0 |
| HIGH | 0 |
| MEDIUM | 0 |
| LOW | 2 |
Overall risk: Low. This is a clean refactor — pipeline orchestration is extracted from executable.rs into specializer::pipeline, ownership semantics shift from CompiledModule (owned) to &CompiledModule (borrowed), and a .clone() is eliminated. No validation logic is added, removed, or altered. No security-relevant invariants change.
Recommendation: APPROVE WITH NOTES
What Changed
| Module | Files | Risk |
|---|---|---|
specializer/src/pipeline.rs |
1 (new) | Low — extraction of existing logic |
specializer/src/lower/mod.rs |
1 | Low — new LoweredFunction/LoweredModule types |
specializer/src/destack/ |
3 | Low — lifetime annotations only (CompiledModule → &'m CompiledModule) |
specializer/src/stackless_exec_ir/ |
2 | Low — lifetime propagation to ModuleIR<'m> |
global-context/src/context/executable.rs |
1 | Low — call-site updated to use new API |
Cargo.toml files |
2 | Low — dependency scope corrections |
Findings
[LOW] Finding 1: destack_and_lower_module has no dedicated test in the specializer crate
File: third_party/move/mono-move/specializer/src/pipeline.rs
Blast radius: 1 non-test caller (global-context::ExecutableBuilder::build)
Test coverage: Untested within specializer; exercised indirectly via global-context integration tests (gas_test.rs, types_tests.rs)
Description: destack_and_lower_module is the new primary public API exported from the specializer crate for the full compilation pipeline (destack → lower → gas instrument → frame layout). The existing specializer test suite (testsuite.rs) exercises destack() directly but has no test runner that calls destack_and_lower_module() or verifies LoweredModule/LoweredFunction output. This means frame-size arithmetic regressions in the pipeline would not be caught by cargo test -p specializer alone.
Concrete impact: Test gap, not a code bug. The integration tests in mono-move-global-context do exercise the full pipeline and would catch gross errors, but fine-grained assertions on args_size, args_and_locals_size, and extended_frame_size are absent from the unit test suite.
[LOW] Finding 2: Identity struct_name_table assumption not documented in the public API
File: third_party/move/mono-move/specializer/src/pipeline.rs:21-24
Blast radius: All future callers of destack_and_lower_module
Description: The function builds the struct_name_table with an identity mapping (StructHandleIndex ordinal i → StructNameIndex(i)), which is semantically correct only when loading a single module in isolation (i.e., all struct handles in the module map 1:1 to the module's own definitions). The inline comment documents this correctly, but the public doc-comment on destack_and_lower_module does not state this restriction. As the codebase grows toward multi-module loading, a caller could invoke this function on a module that cross-references structs from other already-loaded modules and silently receive incorrect Type representations.
Concrete impact: Not exploitable in the current codebase — the only caller is ExecutableBuilder::build, which enforces single-module loading. This is also a pre-existing constraint (the same identity mapping existed in executable.rs before this PR). The finding is a documentation gap, not a code bug.
Semantic Correctness Notes
Behavioral equivalence confirmed: The old inlining in executable.rs and the new pipeline.rs function are semantically equivalent with one minor difference: in the old code, identifier_at(func_ir.name_idx) was resolved and interned for every function (including generic ones that try_build_context skips). In the new code, lf.name_idx is resolved only for lowered (non-generic) functions. This is a harmless improvement — generic function names were interned but never inserted into the functions map.
Borrow/lifetime refactoring is sound: ModuleIR<'m> borrows &'m CompiledModule, and the borrow is held for the lifetime of the destack_and_lower_module call only. ExecutableBuilder retains &'a CompiledModule independently, so there is no aliasing or lifetime hazard.
func_id_map indexing: build_func_id_map(module_ir.module) correctly receives a &'m CompiledModule (via auto-deref of the field, which is already a reference). The map is indexed by FunctionHandleIndex, bounded by module.function_handles.len(), which is consistent with how try_build_context_inner accesses func_id_map[handle_idx.0 as usize]. No out-of-bounds risk relative to before.
Dependency scope correction is correct: Moving mono-move-gas from [dependencies] to [dev-dependencies] in global-context and adding it to [dependencies] in specializer correctly follows where gas instrumentation now lives. Removing move-vm-types from global-context's normal dependencies is safe — no source file in global-context/src/ imports from it after this change.
Blast Radius
| Changed Function | Non-test Callers | Classification |
|---|---|---|
destack(module: &'m CompiledModule, ...) |
2 (mseir-compiler.rs, testsuite.rs) |
LOW |
destack_and_lower_module |
1 (executable.rs) |
LOW |
ExecutableBuilder::build |
3 (gas_test.rs, types_tests.rs, testsuite/runner.rs) |
LOW |
Sent by Cursor Automation: Security Review Bot
| /// Run the full specializer pipeline: destack → lower → gas instrument → frame layout. | ||
| /// | ||
| /// Takes a reference to the compiled module — no cloning required. | ||
| pub fn destack_and_lower_module(module: &CompiledModule) -> Result<LoweredModule> { |
There was a problem hiding this comment.
[LOW] The public doc comment describes the function's purpose but omits the single-module-isolation constraint that the identity struct_name_table (lines 22–24) imposes. Future callers loading a module that cross-references structs from other already-loaded modules would silently receive incorrect Type representations.
Suggested addition to the doc comment:
/// # Limitations
/// The `struct_name_table` is built as an identity mapping (handle index == name index),
/// which is only correct when `module` is loaded in isolation. Multi-module loading
/// (where struct handles span multiple modules) requires a caller-supplied table.
There was a problem hiding this comment.
this is going away with new type integration
| .map(|i| StructNameIndex::new(i as u32)) | ||
| .collect(); | ||
| let module_ir = destack(module, &struct_name_table)?; | ||
| let func_id_map = build_func_id_map(module_ir.module); |
There was a problem hiding this comment.
Minor: build_func_id_map(module_ir.module) passes module_ir.module which is already a &'m CompiledModule. The function signature is build_func_id_map(module: &CompiledModule), so this is an auto-deref through a reference-to-reference — correct and compiles fine, but build_func_id_map(module_ir.module) reads slightly confusingly compared to the old build_func_id_map(&module_ir.module) pattern (where module was owned). No action required, just noting the shift in calling convention.
| /// IR for a module (wraps the original CompiledModule for pool access). | ||
| pub struct ModuleIR { | ||
| /// IR for a module (borrows the original CompiledModule for pool access). | ||
| pub struct ModuleIR<'m> { |
There was a problem hiding this comment.
I think we should keep it owning this? ModuleIR is constructed on cache miss (so CompiledModule is owned in any case) and we kind of need to keep it around for not yet monomorphized functions?
There was a problem hiding this comment.
I'll move it to owning for now. I was hoping we actually could store all the resolved information in the stackless exec IR, but that is a not-in-this-PR endeavor.
| pub use translate::lower_function; | ||
|
|
||
| /// Result of lowering a single non-generic function. | ||
| pub struct LoweredFunction { |
There was a problem hiding this comment.
How is this different from Function - because of allocations? Do we see these unified in the future?
There was a problem hiding this comment.
Yes and yes. Adding a TODO to reflect this.
| .collect(); | ||
| let module_ir = specializer::destack(self.module.clone(), &struct_name_table)?; | ||
| let func_id_map = specializer::lower::build_func_id_map(&module_ir.module); | ||
| let lowered = specializer::destack_and_lower_module(self.module)?; |
There was a problem hiding this comment.
Current builder is ugly and it takes module by ref, but really it should own it if we need to keep ModuleIR in cache?
There was a problem hiding this comment.
Yes, but this would require some bigger changes in the ExecutableBuilder due to borrow semantics. Added a TODO instead.
| .flat_map(|cs| cs.arg_write_slots.iter().chain(cs.ret_read_slots.iter())) | ||
| .map(|s| (s.offset + s.size) as usize) | ||
| .max() | ||
| // Leaf function: no callee slots needed beyond metadata. |
There was a problem hiding this comment.
This will be later extended right?
f1e65d0 to
17c1c4d
Compare
f5e87ee to
7700162
Compare
17c1c4d to
8e76988
Compare
8e76988 to
2f56ed6
Compare
7700162 to
bf4b0ba
Compare
bf4b0ba to
6cb09db
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|



Description
Factored the specializer pipeline out of
executable.rsintospecializer::destack_and_lower_module().What moved into the specializer:
What stayed in executable.rs:
Other changes:
How Has This Been Tested?
Existing tests
Note
Medium Risk
Moderate risk: refactors the executable-building compilation pipeline (lowering, gas instrumentation, frame sizing) and changes
ModuleIR.functionsto includeNoneentries for native/generic functions, which could affect function indexing assumptions.Overview
Refactors the compilation/specialization pipeline by moving orchestration (destack → lower → gas instrumentation → frame layout sizing) out of
global-context’sExecutableBuilderand into a newspecializer::destack_and_lower_module()API.Changes specializer IR shapes and dependencies:
ModuleIR.functionsbecomesVec<Option<FunctionIR>>(preservingFunctionDefinitionIndexalignment and representing native functions asNone), call sites now iterate via.iter().flatten(), and newLoweredFunction/LoweredModulestructs carry gas-instrumented micro-ops plus precomputed frame sizes for consumption by the executable arena allocator.Cargo/deps updated:
mono-move-gasis added tospecializerand moved toglobal-contextdev-deps, with correspondingCargo.lockupdates.Reviewed by Cursor Bugbot for commit 6cb09db. Bugbot is set up for automated code reviews on this repo. Configure here.