Skip to content

[mono-move] Add gas benchmarks#19470

Merged
calintat merged 1 commit intomainfrom
calin/gas-benchmarks
Apr 21, 2026
Merged

[mono-move] Add gas benchmarks#19470
calintat merged 1 commit intomainfrom
calin/gas-benchmarks

Conversation

@calintat
Copy link
Copy Markdown
Contributor

@calintat calintat commented Apr 16, 2026

Description

Add gas-instrumented variants to all micro-op benchmarks (fib, bst, merge_sort, nested_loop) and a new match_sum benchmark with a wide-diamond CFG shape. Each benchmark now runs both a plain and a gas-instrumented version, making it easy to measure gas metering overhead going forward.

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Note

Low Risk
Low risk: changes are limited to benchmarking/test utilities and a new synthetic program, with minimal impact on runtime behavior outside benches.

Overview
Adds gas-measurement coverage to the micro-op benchmarks by running each benchmark in two modes: a plain execution using the new NoOpGasMeter, and a gas-instrumented execution that replays the same program after GasInstrumentor has inserted Charge ops.

Introduces shared bench helper gas_instrument to clone and instrument micro-op Function tables into a fresh arena, and adds a new match_sum synthetic program + Criterion bench (including correctness tests) designed with a wide-diamond CFG to stress basic-block boundary instrumentation. Also makes MicroOp Copy/Clone to simplify handling in instrumentation/bench code.

Reviewed by Cursor Bugbot for commit 54cf9d7. Bugbot is set up for automated code reviews on this repo. Configure here.

@calintat calintat marked this pull request as ready for review April 16, 2026 14:29
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differential Security Review — [mono-move] Add gas benchmarks (PR #19470)

Date: 2026-04-16
Scope: 48c6902d0bb1fc2b04d5a89da07e1a3d2942698b...2bd66122499f5fc664346f13a353a94fa863d7c8
Reviewer: Automated differential review


Executive Summary

Severity Count
CRITICAL 0
HIGH 0
MEDIUM 0
LOW 2

Overall risk: Low. The change is scoped entirely to benchmark infrastructure and test programs inside third_party/move/mono-move/programs/; no production execution path, no consensus, no VM dispatch logic, and no storage are touched.

Key metrics: 12 files changed (12 Rust, 0 Move), 1 module touched (mono-move/programs + mono-move/gas), 2 findings.

Recommendation: APPROVE WITH NOTES


What Changed

Files changed: 12 (all Rust) | Lines: +545 / -8

Module Files Changed Risk Level
mono-move/gas/src/lib.rs 1 Low
mono-move/core/src/instruction/mod.rs 1 Low
mono-move/programs/benches/ 5 (3 modified, 2 new) Low
mono-move/programs/src/ 2 (1 modified, 1 new) Low
mono-move/programs/tests/ 1 (new) Low
mono-move/programs/Cargo.toml 1 Trivial

Findings

[LOW] Finding 1 — NoOpGasMeter exported from production crate without a feature or #[cfg] guard

File: third_party/move/mono-move/gas/src/lib.rs:82–92
Test coverage: Not tested (intended as bench/test helper)

Description: NoOpGasMeter is defined as a top-level pub struct in mono_move_gas::lib, the same crate and visibility level as SimpleGasMeter. Its doc comment says "for testing," but there is no #[cfg(any(test, feature = "testing"))] gate, no #[doc(hidden)], and no module-level barrier preventing its use in production execution paths.

InterpreterContext in the runtime is generic over G: GasMeter. Any future code that passes NoOpGasMeter where a real meter is expected would silently bypass all gas enforcement, with balance() always returning u64::MAX.

Concrete impact: No current exploit — the type is only consumed by bench binaries in this PR. The risk is that this API footgun grows into a future misuse as the codebase matures.

Why here: The type was introduced specifically to support the micro_op (plain, no-instrumentation) benchmark variants. A #[cfg(test)] guard or placement inside a testing feature gate would match the pattern already used elsewhere (e.g. #[cfg(feature = "testing")] in programs/src/lib.rs).


[LOW] Finding 2 — gas_instrument bench helper silently drops GC root metadata

File: third_party/move/mono-move/programs/benches/helpers.rs:228–238
Test coverage: Bench-only, not part of any test target

Description: gas_instrument builds gas-instrumented copies of functions but replaces both frame_layout and safe_point_layouts with empty stubs:

frame_layout: FrameLayoutInfo::empty(&arena),
safe_point_layouts: SortedSafePointEntries::empty(&arena),

The function's doc comment acknowledges this: "Frame layouts are re-created as empty; these benchmark programs do not trigger GC, so the omission has no effect on execution." That is correct for the four benchmark programs today (all scalar-only, no heap pointer slots).

However, the function signature accepts any &[Option<ExecutableArenaPtr<Function>>] with no type-level or debug_assert! enforcement of the "no heap pointer locals" precondition. If a future program with GC-managed pointer slots is passed to this helper, the GC would fail to scan pointer-holding frame slots, causing silent corruption. There is no #[cfg(test)] or #[cfg(feature = "testing")] guard on the function itself.

Concrete impact: No current exploit — all four programs passed to gas_instrument today are scalar-only. The risk is a latent copy-paste footgun for any future benchmark program that allocates heap objects.


Test Coverage Analysis

Changed Function / Path Coverage Notes
NoOpGasMeter Bench-only No unit test asserting the no-op contract
gas_instrument helper Bench-only No test asserting correctness of instrumented output
micro_op_match_sum + bench Well-tested tests/match_sum.rs covers native, micro_op, move_bytecode
MicroOp: Copy + Clone derive Implicit Exercised by raw.to_vec() in gas_instrument

Blast Radius

All changed functions are confined to benchmark and test code; none are callable from the production execution pipeline. The NoOpGasMeter type is the only change reachable from outside test/bench contexts (as a pub export), but no production code currently imports it.


Correctness Notes (Not Findings)

  • Arena lifetime in match_sum / nested_loop gas setup: let (functions, _, _arena) = micro_op_match_sum() followed by let (functions_gas, _arena) = unsafe { helpers::gas_instrument(&wrapped) } shadows the original _arena. This is correct: Rust evaluates the right-hand side before the shadow takes effect, so the original arena outlives the gas_instrument call. The dangling functions/wrapped pointers after shadowing are never used in any closure.

  • Hardcoded functions[6] in bst.rs: Pre-existing pattern; the comment // Function 6 — run_ops in src/bst.rs:315 confirms the index is stable. Not introduced by this PR.

  • Missing Function::resolve_calls for match_sum / nested_loop gas variants: Correct omission — these programs contain no CallFunc ops (no inter-function calls), so no patching is needed.

  • descriptors reuse across bst.rs bench setups: The gas benchmark closure captures descriptors from the first micro_op_bst() call while using functions_gas from the second. This is safe because ObjectDescriptor is plain data with no arena pointers.


Recommendations

Before Production Use

  • Gate NoOpGasMeter behind #[cfg(any(test, feature = "testing"))] or move it to a dedicated test-helper module — prevents accidental use in execution paths as the codebase grows.
  • Add a debug_assert! to gas_instrument that all input functions have empty frame_layout (or document the precondition more explicitly), to catch future benchmark programs with heap pointer locals before they produce GC bugs.
Open in Web View Automation 

Sent by Cursor Automation: Security Review Bot

}

/// A no-op gas meter for testing.
pub struct NoOpGasMeter;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] NoOpGasMeter is exported as a top-level pub struct with no #[cfg(test)] or feature gate. Since InterpreterContext is generic over G: GasMeter, this type is silently usable in any execution context. The doc comment says "for testing" but there is no compile-time enforcement. Consider #[cfg(any(test, feature = "testing"))], consistent with the pattern already used in programs/src/lib.rs.

})
.collect();
(new_fns, arena)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] frame_layout and safe_point_layouts are silently dropped (replaced with empty stubs). The doc comment explains this is safe for programs that don't trigger GC, but the function accepts any input without enforcing that precondition. If a future benchmark program has heap-pointer locals, the GC will silently miss those roots. A debug_assert! on func.frame_layout.heap_ptr_offsets being empty, or a note in the # Safety section, would make the precondition explicit.

@calintat calintat enabled auto-merge (squash) April 21, 2026 16:43
@calintat calintat force-pushed the calin/gas-benchmarks branch from 2bd6612 to 54cf9d7 Compare April 21, 2026 16:43
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Forge suite compat success on ca049383dd80675149ef2d0042668964f9f9107a ==> 54cf9d73b4701bd54beecd529abf8022309f80fc

Compatibility test results for ca049383dd80675149ef2d0042668964f9f9107a ==> 54cf9d73b4701bd54beecd529abf8022309f80fc (PR)
1. Check liveness of validators at old version: ca049383dd80675149ef2d0042668964f9f9107a
compatibility::simple-validator-upgrade::liveness-check : committed: 14215.43 txn/s, latency: 2417.81 ms, (p50: 2400 ms, p70: 2700, p90: 3100 ms, p99: 3500 ms), latency samples: 473320
2. Upgrading first Validator to new version: 54cf9d73b4701bd54beecd529abf8022309f80fc
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6235.94 txn/s, latency: 5423.42 ms, (p50: 5900 ms, p70: 6000, p90: 6100 ms, p99: 6200 ms), latency samples: 218740
3. Upgrading rest of first batch to new version: 54cf9d73b4701bd54beecd529abf8022309f80fc
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6354.84 txn/s, latency: 5359.96 ms, (p50: 5900 ms, p70: 6000, p90: 6100 ms, p99: 6200 ms), latency samples: 220680
4. upgrading second batch to new version: 54cf9d73b4701bd54beecd529abf8022309f80fc
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11053.36 txn/s, latency: 2928.14 ms, (p50: 3100 ms, p70: 3200, p90: 3400 ms, p99: 3500 ms), latency samples: 363700
5. check swarm health
Compatibility test for ca049383dd80675149ef2d0042668964f9f9107a ==> 54cf9d73b4701bd54beecd529abf8022309f80fc passed
Test Ok

@github-actions
Copy link
Copy Markdown
Contributor

✅ Forge suite realistic_env_max_load success on 54cf9d73b4701bd54beecd529abf8022309f80fc

Forge report malformed: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
'{\n[2026-04-21T23:19:03Z INFO  aptos_forge::report] Test Ok\n  "metrics": [\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "submitted_txn",\n      "value": 5933420.0\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "avg_tps",\n      "value": 15886.23278806848\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "avg_latency",\n      "value": 1083.840425926363\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "p50_latency",\n      "value": 1000.0\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "p90_latency",\n      "value": 1200.0\n    },\n    {\n      "test_name": "two traffics test: inner traffic",\n      "metric": "p99_latency",\n      "value": 1600.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "submitted_txn",\n      "value": 42600.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "expired_txn",\n      "value": 0.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "avg_tps",\n      "value": 99.98399980554233\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "avg_latency",\n      "value": 836.9186046511628\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "p50_latency",\n      "value": 800.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "p90_latency",\n      "value": 1000.0\n    },\n    {\n      "test_name": "two traffics test",\n      "metric": "p99_latency",\n      "value": 1100.0\n    }\n  ],\n  "text": "two traffics test: inner traffic : committed: 15886.23 txn/s, latency: 1083.84 ms, (p50: 1000 ms, p70: 1100, p90: 1200 ms, p99: 1600 ms), latency samples: 5933420\\ntwo traffics test : committed: 99.98 txn/s, latency: 836.92 ms, (p50: 800 ms, p70: 900, p90: 1000 ms, p99: 1100 ms), latency samples: 1720\\nLatency breakdown for phase 0: [\\"MempoolToBlockCreation: max: 0.278, avg: 0.260\\", \\"ConsensusProposalToOrdered: max: 0.118, avg: 0.114\\", \\"ConsensusOrderedToCommit: max: 0.204, avg: 0.175\\", \\"ConsensusProposalToCommit: max: 0.315, avg: 0.289\\"]\\nMax non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.46s no progress at version 6012009 (avg 0.06s) [limit 15].\\nMax epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.33s no progress at version 2808488 (avg 0.33s) [limit 16].\\nTest Ok"\n}'
Trailing Log Lines:
[2026-04-21T23:18:58Z INFO  ureq::unit] sending request POST http://vmagent-victoria-metrics-agent.victoria-metrics.svc:8429/api/v1/import/prometheus
test CompositeNetworkTest ... ok
Test Statistics: 
two traffics test: inner traffic : committed: 15886.23 txn/s, latency: 1083.84 ms, (p50: 1000 ms, p70: 1100, p90: 1200 ms, p99: 1600 ms), latency samples: 5933420
two traffics test : committed: 99.98 txn/s, latency: 836.92 ms, (p50: 800 ms, p70: 900, p90: 1000 ms, p99: 1100 ms), latency samples: 1720
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.278, avg: 0.260", "ConsensusProposalToOrdered: max: 0.118, avg: 0.114", "ConsensusOrderedToCommit: max: 0.204, avg: 0.175", "ConsensusProposalToCommit: max: 0.315, avg: 0.289"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.46s no progress at version 6012009 (avg 0.06s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.33s no progress at version 2808488 (avg 0.33s) [limit 16].
Test Ok

=== BEGIN JUNIT ===
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="forge" tests="1" failures="0" errors="0" uuid="b27fea43-186b-4bd6-8533-d5acee2fb3d4">
    <testsuite name="local" tests="1" disabled="0" errors="0" failures="0">
        <testcase name="CompositeNetworkTest(network:multi-region-network-emulation(two traffics test)) with ">
        </testcase>
    </testsuite>
</testsuites>
=== END JUNIT ===
[2026-04-21T23:19:03Z INFO  aptos_forge::backend::k8s::cluster_helper] Deleting namespace forge-e2e-pr-19470: Some(NamespaceStatus { conditions: None, phase: Some("Terminating") })
[2026-04-21T23:19:03Z INFO  aptos_forge::backend::k8s::cluster_helper] aptos-node resources for Forge removed in namespace: forge-e2e-pr-19470
[2026-04-21T23:19:03Z INFO  ureq::unit] sending request POST http://vmagent-victoria-metrics-agent.victoria-metrics.svc:8429/api/v1/import/prometheus

test result: ok. 1 passed; 0 soft failed; 0 hard failed; 0 filtered out

Debugging output:
NAME                                         READY   STATUS      RESTARTS   AGE
aptos-node-0-validator-0                     1/1     Running     0          12m
aptos-node-1-validator-0                     1/1     Running     0          12m
aptos-node-2-validator-0                     1/1     Running     0          12m
aptos-node-3-validator-0                     1/1     Running     0          12m
aptos-node-4-validator-0                     1/1     Running     0          12m
aptos-node-5-validator-0                     1/1     Running     0          12m
aptos-node-6-validator-0                     1/1     Running     0          12m
forge-pfn-deployer-bsgx7                     0/1     Completed   0          13m
forge-testnet-deployer-fpbx8                 0/1     Completed   0          13m
genesis-aptos-genesis-eforge4af72268-pmhmb   0/1     Completed   0          12m
pfn-0-0                                      1/1     Running     0          12m
pfn-1-0                                      1/1     Running     0          12m
pfn-2-0                                      1/1     Running     0          12m

@calintat calintat merged commit e48def4 into main Apr 21, 2026
74 of 76 checks passed
@calintat calintat deleted the calin/gas-benchmarks branch April 21, 2026 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants