[consensus] Fix BATCH_NUM_PER_BLOCK histogram buckets by danielxiangzl · Pull Request #19473 · aptos-labs/aptos-core

danielxiangzl · 2026-04-16T20:32:06Z

Summary

Fix BATCH_NUM_PER_BLOCK histogram to use PROOF_COUNT_BUCKETS instead of TRANSACTION_COUNT_BUCKETS
The old buckets were designed for transaction counts (exponential from 1.5 up to thousands), while batch counts per block are typically in the tens to low hundreds

Test plan

Compiles cleanly
Lint passes

🤖 Generated with Claude Code

Note

Low Risk
Low risk: metrics-only change that adjusts histogram bucket definitions for batch/proof counts and does not affect consensus logic or data handling.

Overview
Updates quorum store metrics to use PROOF_COUNT_BUCKETS (instead of transaction-oriented buckets) for count-based histograms: BATCH_NUM_PER_BLOCK, NUM_BATCHES_WITHOUT_PROOF_OF_STORE, and PROOF_SIZE_WHEN_PULL. This improves bucket granularity for typical batch/proof counts without changing any runtime behavior beyond metrics aggregation.

^{Reviewed by Cursor Bugbot for commit 9c62399. Bugbot is set up for automated code reviews on this repo. Configure here.}

Use PROOF_COUNT_BUCKETS instead of TRANSACTION_COUNT_BUCKETS for the batch-count-per-block metric. The old buckets were designed for transaction counts (exponential from 1.5 up to thousands), while batch counts per block are typically in the tens to low hundreds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor

Stale comment

Differential Security Review — PR #19473

Scope: c1dfd4f..2844c6f · 4 commits · 7 files · +107 / -44
Reviewer: Automated differential review (Trail of Bits / Cursor)

Executive Summary

Severity Count

CRITICAL 0

HIGH 1

MEDIUM 1

LOW 2

INFO 1

Overall risk: Medium. The metrics fix is correct and low-risk. The HACK code in round_manager.rs introduces a deliberate, unconditional 1-second proposal delay in production consensus that degrades liveness and lacks any gating mechanism; this is the primary concern. Forge test-suite changes are internally consistent but weaken land-blocking regression coverage.

Recommendation: REVIEW BEFORE MERGE

What Changed

Module Files Risk

consensus/src/quorum_store/ 1 Low — metrics only

consensus/src/ 1 High — production consensus behavior

testsuite/forge/src/ 2 Low — test infra

testsuite/forge-cli/src/suites/ 2 Medium — land-blocking threshold reduction

.github/workflows/ 1 Low — CI timeout increase

Findings

[HIGH] Unconditional HACK delay shipped in production consensus — no feature gate

File: consensus/src/round_manager.rs (around lines 485–519 and 1453–1489)
Blast radius: Affects every validator deployment where the author is the last by validator_index ordering; applied on round % 10 == 0 rounds unconditionally.
Test coverage: No tests verify the delay behavior or its absence in non-HACK builds.

Description: Two blocks labelled // HACK unconditionally delay proposal generation by 1 second (≥ round_initial_timeout_ms default of 1000 ms) for the last validator by ordered index, 10% of rounds. There is no #[cfg(test)], #[cfg(feature = "...")], or config-flag guard. The delay runs on any binary built from this diff — including mainnet validators.

Concrete impact:

Liveness degradation: For the targeted validator, every 10th round as proposer results in a near-certain timeout (1s delay with 1s timeout). This raises consensus round latency, increases timeout-vote traffic, and reduces throughput measurably.

Asymmetric burden: A specific operator (the one holding the highest validator_index in the epoch's ValidatorSet) is systematically penalized. This ordering is stable within an epoch but changes at epoch boundaries.

Log noise in production: warn!("HACK: delaying proposal…") fires in production logs every 10th proposer round for the affected node, confusing operators and alerting pipelines. The optimistic path (opt_proposal_delay) has no corresponding warn, adding asymmetric observability.

Mixed-build divergence: Any deployment mixing this build with a clean build gets behaviorally divergent validators at a consensus-critical path.

n == 1 amplification: On a single-validator network the condition i >= n - 1 is 0 >= 0 — every round divisible by 10 triggers the delay, not just 10%.

The commit message acknowledges this is for testing "slow proposer" behavior; it should not reach production without gating.

[MEDIUM] Land-blocking test stringency materially reduced — epoch regression coverage removed

File: testsuite/forge-cli/src/suites/realistic_environment.rs, testsuite/forge-cli/src/suites/land_blocking.rs
Test coverage: These ARE the tests; the change reduces what they catch.

Description: Three concurrent relaxations to realistic_env_max_load_test as used in land-blocking:

Traffic mode: EmitJobMode::MaxLoad { mempool_backlog: 38000 } → EmitJobMode::ConstTps { tps: 4000 }. MaxLoad stresses the network to its capacity ceiling; ConstTps at 4000 caps load regardless of what the chain can handle.

Success criteria: 10 000 TPS → 3 500 TPS. A 65% reduction in the minimum passing throughput bar.

Epoch duration: 300 s → 24 * 3600 s. No epoch transition occurs during any realistic test run. Consensus reconfiguration paths, state-sync edge cases at epoch boundaries, and on-chain governance driven by new epochs are no longer exercised by this job.

Concrete impact: Regressions that degrade throughput to between 3 500 and 10 000 TPS, or that break behavior at epoch boundaries, will now pass land-blocking. This is a test-coverage gap, not a runtime bug. Acceptable if a separate longer-running suite (e.g. forge-stable) retains the stricter criteria, but that should be confirmed.

[LOW] PROOF_SIZE_WHEN_PULL and NUM_BATCHES_WITHOUT_PROOF_OF_STORE still use transaction-scale buckets

File: consensus/src/quorum_store/counters.rs (lines ~406–413 and ~378–385)
Concrete impact: Metrics observability only — no runtime or security effect.

Description: The PR correctly fixes BATCH_NUM_PER_BLOCK to use PROOF_COUNT_BUCKETS (range ~1–500) instead of TRANSACTION_COUNT_BUCKETS (exponential up to ~25 000+). Two adjacent metrics that also measure batch/proof counts (not transaction counts) still use TRANSACTION_COUNT_BUCKETS:

PROOF_SIZE_WHEN_PULL — "number of proof-of-store per block when pulled"

NUM_BATCHES_WITHOUT_PROOF_OF_STORE — "number of batches without proof of store"

Both would benefit from PROOF_COUNT_BUCKETS for the same reason as BATCH_NUM_PER_BLOCK. Not blocking, but worth cleaning up in the same pass.

[LOW] Warn log for HACK missing on optimistic proposal path

File: consensus/src/round_manager.rs (~line 1453)
Concrete impact: No runtime bug; observability asymmetry if HACK ships.

Description: The regular proposal path emits warn!("HACK: delaying proposal…") when a delay fires. The optimistic proposal path (opt_proposal_delay) has no equivalent warn, so delayed opt proposals are silent — harder to correlate with latency spikes from that code path.

[INFO] FORGE_RUNNER_DURATION_SECS increase: CI wall-time impact

File: .github/workflows/docker-build-test.yaml
Concrete impact: Infrastructure cost / CI queue latency.

The realistic_env_max_load Forge job's runner duration increased from 480 s to 1800 s. Combined with the duration_override(900s) for land-blocking and the duration_override.unwrap_or(global_duration) pattern in runner.rs, the effective namespace TTL for land-blocking is 900 + 1200 = 2100 s. The CI job budget (1800 s) is tighter than the full cluster TTL, but the internal runner timeout is driven by duration_override (900 s) + test overhead, not the Forge runner duration directly. No correctness issue, but CI cost increases ~3.75×.

Test Coverage Assessment

Changed Component Coverage Risk Elevation

counters.rs bucket fix No dedicated test; bucket defs are observability-only No elevation

round_manager.rs HACK delay No test for delay presence or absence; no gate HIGH — production behavior change without test coverage

Forge config / runner plumbing Exercised by the test suite it configures No elevation

Land-blocking thresholds These are the tests — coverage gap introduced MEDIUM

Blast Radius

Changed Symbol Non-test callers Classification

BATCH_NUM_PER_BLOCK (bucket change) update_batch_stats (1 site) LOW — observability only

proposal_delay HACK Every process_new_round_event invocation on the target validator HIGH

opt_proposal_delay HACK Every start_next_opt_round invocation on the target validator HIGH

ForgeConfig::duration_override runner.rs (2 sites); land_blocking.rs (1 site) LOW — test infra

Recommendations

Before Merge

Gate or remove the HACK delay in round_manager.rs. If this behavior is needed to test slow-proposer resilience, it should be gated behind #[cfg(feature = "failpoints")], a dedicated ConsensusConfig field (e.g. test_proposal_delay_ms), or moved to the failpoint injection system (fail::fail_point!) already used elsewhere in the consensus codebase.

Confirm a stricter suite covers epoch transitions — if forge-stable or another scheduled job still runs the higher-backlog / shorter-epoch configuration, document that explicitly. If not, the epoch regression coverage loss in land-blocking is unmitigated.

Nice-to-have

Add PROOF_COUNT_BUCKETS to PROOF_SIZE_WHEN_PULL and NUM_BATCHES_WITHOUT_PROOF_OF_STORE to complete the bucket alignment started in this PR.

Add warn! to the optimistic proposal delay path for symmetric observability.

_{Sent by Cursor Automation: Security Review Bot}

cursor · 2026-04-16T20:40:15Z

        "Histogram for the number of batches per (committed) blocks.",
        &["type"],
-        TRANSACTION_COUNT_BUCKETS.clone(),
+        PROOF_COUNT_BUCKETS.clone(),


Bucket fix is correct. BATCH_NUM_PER_BLOCK measures proof/batch counts per block (observed values of proof_num, inline_batch_num, opt_batch_num — all small integers, typically < 500). PROOF_COUNT_BUCKETS (max 500) is the right scale; TRANSACTION_COUNT_BUCKETS (exponential to ~25 000+) was semantically mismatched.

Two adjacent metrics with the same scale mismatch remain: PROOF_SIZE_WHEN_PULL (line ~408) and NUM_BATCHES_WITHOUT_PROOF_OF_STORE (line ~380) — both measure proof/batch counts but still use TRANSACTION_COUNT_BUCKETS.

cursor

Differential Security Review — PR #19473: Fix BATCH_NUM_PER_BLOCK histogram buckets

👀

Date: 2026-04-16
Scope: Single commit dc419c3688 — consensus/src/quorum_store/counters.rs, line 169
Reviewer: Automated differential review

Executive Summary

Severity	Count
CRITICAL	0
HIGH	0
MEDIUM	0
LOW	0
Informational	2

Overall risk: Low. This is a metrics-only change that corrects histogram bucket semantics with no effect on consensus correctness, validation logic, or security properties.

Key metrics: 1 file changed, 1 Rust line changed, 1 finding (informational consistency gap)

Recommendation: APPROVE WITH NOTES

What Changed

Commits: c1dfd4f..dc419c3 (1 commit)
Files changed: 1 (1 Rust, 0 Move, 0 config) | Lines: +1 / -1

Module	Files Changed	Risk Level
`consensus/src/quorum_store/`	1	LOW

Phase 0: Orientation

Change scope: 1 file changed (1 Rust). Strategy: Full trace (tiny diff).
Critical files: None.
Risk assessment: Metrics/observability change only. Histogram bucket boundaries have zero impact on consensus protocol, execution, validation, or security-critical paths.

File Risk Classification

File	Complexity	Risk	Rationale
`consensus/src/quorum_store/counters.rs`	Low	LOW	Prometheus `static Lazy<HistogramVec>` bucket swap; no state, no logic

Phase 1: Pattern Detection

BEFORE:

TRANSACTION_COUNT_BUCKETS.clone(),  // exponential_buckets(1.5, 1.5, 25) → ~1.5 to ~25,251

AFTER:

PROOF_COUNT_BUCKETS.clone(),  // [1, 3, 5, 7, 10, 12, 15, 20, 25, 30, 40, 50, 60, 75, 100, 125, 150, 200, 250, 300, 500]

CHANGE: Histogram bucket layout. TRANSACTION_COUNT_BUCKETS uses exponential_buckets(1.5, 1.5, 25), producing 25 upper bounds from ~1.5 to ~25,251. PROOF_COUNT_BUCKETS is a hand-tuned set of 21 bounds from 1 to 500, giving dense granularity in the 1–100 range. The metric BATCH_NUM_PER_BLOCK observes the count of proof-type batches, inline batches, and opt-batches per committed block.

SECURITY: None — histogram bucket changes are pure observability. No validation, no branching, no state mutation.

No Rust red flags triggered (no new .unwrap(), no unsafe, no removed checks, no visibility changes, no arithmetic).

Phase 2: Semantic Analysis

Semantic correctness of the bucket choice:

BATCH_NUM_PER_BLOCK observes results of block.proof_stats(), block.inline_batch_stats(), and block.opt_batch_stats(). These return batch/proof counts per committed block, not transaction counts. In normal Aptos mainnet operation, blocks carry tens to low hundreds of batches — not the thousands that TRANSACTION_COUNT_BUCKETS was designed for. PROOF_COUNT_BUCKETS (1–500) provides better resolution in that range and is already used by related proof-count histograms in the same file (PROOFS_WITHOUT_BATCH_SUMMARY, PROOFS_WITH_BATCH_SUMMARY, NUM_PROOFS_IN_PROOF_QUEUE_AFTER_PULL). The fix aligns naming and semantics consistently.

Blast radius: BATCH_NUM_PER_BLOCK is only written inside update_batch_stats(), which is called from one site in consensus/src/counters.rs:update_counters_for_block. No external code references the static directly. No consensus decision depends on histogram values.

Findings

[INFORMATIONAL] Finding 1 — `PROOF_SIZE_WHEN_PULL` and `NUM_BATCHES_WITHOUT_PROOF_OF_STORE` remain on `TRANSACTION_COUNT_BUCKETS`

File: consensus/src/quorum_store/counters.rs:407 and :378
Blast radius: Observability only
Test coverage: N/A (metrics registration)

Description: Two other histograms that count proofs or batches (not transactions) still use TRANSACTION_COUNT_BUCKETS:

quorum_store_proof_size_when_pull — observes proof_of_stores.len() on each block pull (call site: batch_proof_queue.rs:577)
num_batches_without_proof_of_store — observes num_batches_without_proof() in proof manager

These are semantically identical to BATCH_NUM_PER_BLOCK (small-integer batch/proof counts), yet they use the coarse exponential transaction buckets that start at 1.5. This inconsistency means dashboards and alerts get fine resolution for batch_num_per_block but not for these two related metrics.

Concrete impact: No current exploit, no correctness issue. Dashboard/alerting resolution mismatch only.

Note: Not a blocker for this PR; suitable as a follow-up.

[INFORMATIONAL] Finding 2 — Metrics series continuity on rollout

File: consensus/src/quorum_store/counters.rs:164

Description: Changing histogram bucket definitions causes Prometheus _bucket time-series label sets to shift on process restart. Any dashboards or alerts keyed to specific bucket boundaries (e.g. le="25.62890625" from the old exponential set) will stop matching. This is operational, not a security concern.

Concrete impact: No current exploit. Operator action may be needed to update dashboards after rollout.

Test Coverage Analysis

Changed Element	Coverage	Risk Elevation	Recommendation
`BATCH_NUM_PER_BLOCK` bucket definition	N/A — metrics registration	None	No action needed; bucket changes are not unit-testable at the Rust level in a meaningful security sense

Blast Radius

Changed Symbol	Non-Test Callers	Classification	Critical Callers
`BATCH_NUM_PER_BLOCK` (indirect via `update_batch_stats`)	1 (`consensus/src/counters.rs`)	LOW	None

Highest-risk dependency chains: None — update_batch_stats is a fire-and-forget metrics observer.

Historical Context

The original introduction of BATCH_NUM_PER_BLOCK in commit f37103cef0 ([counters] adding/fixing batch counters) used TRANSACTION_COUNT_BUCKETS without documented rationale. There is no indication that bucket choice was tied to alerting thresholds or security invariants. This PR's fix is not a regression of a security measure.

Recommendations

Before Production

(Optional, non-blocking) Follow-up PR to align PROOF_SIZE_WHEN_PULL and NUM_BATCHES_WITHOUT_PROOF_OF_STORE with PROOF_COUNT_BUCKETS for consistency

Operational

Update any dashboards or alerts that query quorum_store_batch_num_per_block with specific le= bucket boundaries after rollout

_{Sent by Cursor Automation: Security Review Bot}

cursor · 2026-04-16T20:41:50Z

        "Histogram for the number of batches per (committed) blocks.",
        &["type"],
-        TRANSACTION_COUNT_BUCKETS.clone(),
+        PROOF_COUNT_BUCKETS.clone(),


[INFORMATIONAL] The bucket swap from TRANSACTION_COUNT_BUCKETS (exponential, ~1.5 to ~25,251) to PROOF_COUNT_BUCKETS (hand-tuned, 1–500) is semantically correct: this metric counts batches/proofs per committed block, which in practice stays in the tens-to-low-hundreds range. The fix also makes this metric consistent with PROOFS_WITHOUT_BATCH_SUMMARY, PROOFS_WITH_BATCH_SUMMARY, and NUM_PROOFS_IN_PROOF_QUEUE_AFTER_PULL, which already use PROOF_COUNT_BUCKETS.

Note for follow-up: PROOF_SIZE_WHEN_PULL (line 407) and NUM_BATCHES_WITHOUT_PROOF_OF_STORE (line 378) are semantically equivalent (probe counts, not tx counts) but still use TRANSACTION_COUNT_BUCKETS. Aligning those in a separate PR would complete the consistency story.

Address review comment: NUM_BATCHES_WITHOUT_PROOF_OF_STORE and PROOF_SIZE_WHEN_PULL measure proof/batch counts (small integers), so they should use PROOF_COUNT_BUCKETS rather than the transaction-scaled TRANSACTION_COUNT_BUCKETS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-20T20:59:17Z

✅ Forge suite `compat` success on `ca049383dd80675149ef2d0042668964f9f9107a` ==> `9c623999dd9d9cda3e397850d170957db7493387`

Compatibility test results for ca049383dd80675149ef2d0042668964f9f9107a ==> 9c623999dd9d9cda3e397850d170957db7493387 (PR)
1. Check liveness of validators at old version: ca049383dd80675149ef2d0042668964f9f9107a
compatibility::simple-validator-upgrade::liveness-check : committed: 13721.66 txn/s, latency: 2529.76 ms, (p50: 2400 ms, p70: 2700, p90: 3400 ms, p99: 4600 ms), latency samples: 447700
2. Upgrading first Validator to new version: 9c623999dd9d9cda3e397850d170957db7493387
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6327.72 txn/s, latency: 5312.44 ms, (p50: 5700 ms, p70: 5900, p90: 6000 ms, p99: 6100 ms), latency samples: 223720
3. Upgrading rest of first batch to new version: 9c623999dd9d9cda3e397850d170957db7493387
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6356.18 txn/s, latency: 5310.87 ms, (p50: 5900 ms, p70: 6000, p90: 6200 ms, p99: 6300 ms), latency samples: 217700
4. upgrading second batch to new version: 9c623999dd9d9cda3e397850d170957db7493387
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11095.80 txn/s, latency: 2957.08 ms, (p50: 3100 ms, p70: 3300, p90: 3500 ms, p99: 3800 ms), latency samples: 364720
5. check swarm health
Compatibility test for ca049383dd80675149ef2d0042668964f9f9107a ==> 9c623999dd9d9cda3e397850d170957db7493387 passed
Test Ok

github-actions · 2026-04-20T21:01:54Z

✅ Forge suite `framework_upgrade` success on `ca049383dd80675149ef2d0042668964f9f9107a` ==> `9c623999dd9d9cda3e397850d170957db7493387`

Compatibility test results for ca049383dd80675149ef2d0042668964f9f9107a ==> 9c623999dd9d9cda3e397850d170957db7493387 (PR)
Upgrade the nodes to version: 9c623999dd9d9cda3e397850d170957db7493387
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2226.84 txn/s, submitted: 2233.42 txn/s, failed submission: 6.59 txn/s, expired: 6.59 txn/s, latency: 1291.49 ms, (p50: 1200 ms, p70: 1500, p90: 1800 ms, p99: 2700 ms), latency samples: 202801
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2432.26 txn/s, submitted: 2439.77 txn/s, failed submission: 7.52 txn/s, expired: 7.52 txn/s, latency: 1197.05 ms, (p50: 1100 ms, p70: 1200, p90: 1800 ms, p99: 2700 ms), latency samples: 219962
5. check swarm health
Compatibility test for ca049383dd80675149ef2d0042668964f9f9107a ==> 9c623999dd9d9cda3e397850d170957db7493387 passed
Upgrade the remaining nodes to version: 9c623999dd9d9cda3e397850d170957db7493387
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1469.40 txn/s, submitted: 1473.70 txn/s, failed submission: 4.30 txn/s, expired: 4.30 txn/s, latency: 2386.54 ms, (p50: 1200 ms, p70: 1500, p90: 6000 ms, p99: 16900 ms), latency samples: 129820
Test Ok

github-actions · 2026-04-20T21:02:43Z

✅ Forge suite `realistic_env_max_load` success on `9c623999dd9d9cda3e397850d170957db7493387`

two traffics test: inner traffic : committed: 13212.33 txn/s, latency: 1379.23 ms, (p50: 1500 ms, p70: 1500, p90: 1800 ms, p99: 2300 ms), latency samples: 4934200
two traffics test : committed: 99.99 txn/s, latency: 624.66 ms, (p50: 500 ms, p70: 600, p90: 900 ms, p99: 1000 ms), latency samples: 1660
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.736, avg: 0.710", "ConsensusProposalToOrdered: max: 0.125, avg: 0.116", "ConsensusOrderedToCommit: max: 0.162, avg: 0.153", "ConsensusProposalToCommit: max: 0.285, avg: 0.269"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.50s no progress at version 4905479 (avg 0.06s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.35s no progress at version 2204541 (avg 0.35s) [limit 16].
Test Ok

danielxiangzl requested review from a team, bchocho, ibalajiarun, sasha8 and zekun000 as code owners April 16, 2026 20:32

danielxiangzl marked this pull request as draft April 16, 2026 20:32

danielxiangzl force-pushed the daniel/fix-batch-num-buckets branch from 2844c6f to dc419c3 Compare April 16, 2026 20:33

danielxiangzl marked this pull request as ready for review April 16, 2026 20:34

danielxiangzl requested review from grao1991 and removed request for a team April 16, 2026 20:34

cursor Bot reviewed Apr 16, 2026

View reviewed changes

grao1991 approved these changes Apr 16, 2026

View reviewed changes

zekun000 approved these changes Apr 20, 2026

View reviewed changes

danielxiangzl enabled auto-merge (squash) April 20, 2026 20:13

Merge branch 'main' into daniel/fix-batch-num-buckets

9c62399

This comment has been minimized.

Sign in to view

danielxiangzl merged commit 81b3d57 into main Apr 21, 2026
78 of 83 checks passed

danielxiangzl deleted the daniel/fix-batch-num-buckets branch April 21, 2026 00:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[consensus] Fix BATCH_NUM_PER_BLOCK histogram buckets#19473

[consensus] Fix BATCH_NUM_PER_BLOCK histogram buckets#19473
danielxiangzl merged 3 commits intomainfrom
daniel/fix-batch-num-buckets

danielxiangzl commented Apr 16, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

cursor Bot Apr 16, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 16, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Module	Files	Risk
`consensus/src/quorum_store/`	1	Low — metrics only
`consensus/src/`	1	High — production consensus behavior
`testsuite/forge/src/`	2	Low — test infra
`testsuite/forge-cli/src/suites/`	2	Medium — land-blocking threshold reduction
`.github/workflows/`	1	Low — CI timeout increase

Changed Component	Coverage	Risk Elevation
`counters.rs` bucket fix	No dedicated test; bucket defs are observability-only	No elevation
`round_manager.rs` HACK delay	No test for delay presence or absence; no gate	HIGH — production behavior change without test coverage
Forge config / runner plumbing	Exercised by the test suite it configures	No elevation
Land-blocking thresholds	These are the tests — coverage gap introduced	MEDIUM

Changed Symbol	Non-test callers	Classification
`BATCH_NUM_PER_BLOCK` (bucket change)	`update_batch_stats` (1 site)	LOW — observability only
`proposal_delay` HACK	Every `process_new_round_event` invocation on the target validator	HIGH
`opt_proposal_delay` HACK	Every `start_next_opt_round` invocation on the target validator	HIGH
`ForgeConfig::duration_override`	`runner.rs` (2 sites); `land_blocking.rs` (1 site)	LOW — test infra

Conversation

danielxiangzl commented Apr 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Differential Security Review — PR #19473

Executive Summary

What Changed

Findings

[HIGH] Unconditional HACK delay shipped in production consensus — no feature gate

[MEDIUM] Land-blocking test stringency materially reduced — epoch regression coverage removed

[LOW] PROOF_SIZE_WHEN_PULL and NUM_BATCHES_WITHOUT_PROOF_OF_STORE still use transaction-scale buckets

[LOW] Warn log for HACK missing on optimistic proposal path

[INFO] FORGE_RUNNER_DURATION_SECS increase: CI wall-time impact

Test Coverage Assessment

Blast Radius

Recommendations

Before Merge

Nice-to-have

Uh oh!

cursor Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Differential Security Review — PR #19473: Fix BATCH_NUM_PER_BLOCK histogram buckets

Executive Summary

What Changed

Phase 0: Orientation

File Risk Classification

Phase 1: Pattern Detection

Phase 2: Semantic Analysis

Findings

[INFORMATIONAL] Finding 1 — PROOF_SIZE_WHEN_PULL and NUM_BATCHES_WITHOUT_PROOF_OF_STORE remain on TRANSACTION_COUNT_BUCKETS

[INFORMATIONAL] Finding 2 — Metrics series continuity on rollout

Test Coverage Analysis

Blast Radius

Historical Context

Recommendations

Before Production

Operational

Uh oh!

cursor Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Apr 20, 2026

✅ Forge suite compat success on ca049383dd80675149ef2d0042668964f9f9107a ==> 9c623999dd9d9cda3e397850d170957db7493387

Uh oh!

github-actions Bot commented Apr 20, 2026

✅ Forge suite framework_upgrade success on ca049383dd80675149ef2d0042668964f9f9107a ==> 9c623999dd9d9cda3e397850d170957db7493387

Uh oh!

github-actions Bot commented Apr 20, 2026

✅ Forge suite realistic_env_max_load success on 9c623999dd9d9cda3e397850d170957db7493387

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danielxiangzl commented Apr 16, 2026 •

edited by cursor Bot

Loading

cursor Bot left a comment •

edited

Loading

[LOW] `PROOF_SIZE_WHEN_PULL` and `NUM_BATCHES_WITHOUT_PROOF_OF_STORE` still use transaction-scale buckets

[INFO] `FORGE_RUNNER_DURATION_SECS` increase: CI wall-time impact

[INFORMATIONAL] Finding 1 — `PROOF_SIZE_WHEN_PULL` and `NUM_BATCHES_WITHOUT_PROOF_OF_STORE` remain on `TRANSACTION_COUNT_BUCKETS`

✅ Forge suite `compat` success on `ca049383dd80675149ef2d0042668964f9f9107a` ==> `9c623999dd9d9cda3e397850d170957db7493387`

✅ Forge suite `framework_upgrade` success on `ca049383dd80675149ef2d0042668964f9f9107a` ==> `9c623999dd9d9cda3e397850d170957db7493387`

✅ Forge suite `realistic_env_max_load` success on `9c623999dd9d9cda3e397850d170957db7493387`