Skip to content

[consensus] Lower opt_qs_minimum_batch_age_usecs from 50ms to 20ms#19480

Draft
danielxiangzl wants to merge 2 commits intodaniel/self-opt-metricsfrom
daniel/lower-min-batch-age
Draft

[consensus] Lower opt_qs_minimum_batch_age_usecs from 50ms to 20ms#19480
danielxiangzl wants to merge 2 commits intodaniel/self-opt-metricsfrom
daniel/lower-min-batch-age

Conversation

@danielxiangzl
Copy link
Copy Markdown
Contributor

Summary

Test plan

  • Deploy on euwe6-1 only (canary)
  • Observe via Phase 1 metrics for 24h:
    • aptos_consensus_self_opt_proposal_committed / spawned — opt success rate (should stay ~95%+)
    • quorum_store_self_proposed_batches_by_type_and_author{type=opt_batch} — should increase
    • quorum_store_batch_skipped_too_young — should drop
    • aptos_consensus_proposal_payload_availability_count{status=missing} on recipient validators — monitor for spikes

Rationale

From euwe6-1 data (1h window, 2026-04-16):

  • Inline pulls p50 = 19ms, p90 = 39ms
  • Current 50ms threshold excludes most batches <50ms
  • EU proof formation ~15-30ms means EU batches are rarely opt-eligible
  • 20ms gives safe margin over intra-EU 5ms one-way + jitter

🤖 Generated with Claude Code

Experiment to increase opt batch inclusion rate, especially for EU-authored
batches where proof forms in 15-30ms and the current 50ms threshold makes
them almost always ineligible for opt path.

Data from euwe6-1:
- Batch age when pulled inline: p50=19ms, p90=39ms
- Current opt pull starts at 50ms - many eligible batches are skipped
- Intra-EU one-way network ~5ms; 20ms gives safe margin for dissemination

Expected effects:
- BATCH_SKIPPED_TOO_YOUNG rate drops
- Per-author opt batch ratio increases (especially EU authors)
- Opt proposal success rate should stay similar (monitor via Phase 1 metrics)

Based on daniel/self-opt-metrics (PR #19479) which adds the observation
metrics. Deploy on euwe6-1 only for Phase 2 canary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@danielxiangzl danielxiangzl added CICD:build-performance-images build performance docker image variants CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR labels Apr 17, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

… 10ms

Phase 3 of the min_batch_age experiment, continuing from 20ms change
in the previous commit.

10ms is the natural next step:
- Intra-EU one-way network ~5ms; 10ms gives small margin for jitter
- Captures ~90% of inline-pulled batches (euwe6-1 baseline: inline p10=10ms, p50=19ms, p90=39ms)
- Expected to push EU-authored opt ratio further

Deploy on euwe6-1 only, after observing 20ms for enough time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Forge suite compat success on da70c48185fb85346c1e404ba45f2498824cbd22 ==> 9a47362134c69c025cb8acf16b8eb703e04594c7

Compatibility test results for da70c48185fb85346c1e404ba45f2498824cbd22 ==> 9a47362134c69c025cb8acf16b8eb703e04594c7 (PR)
1. Check liveness of validators at old version: da70c48185fb85346c1e404ba45f2498824cbd22
compatibility::simple-validator-upgrade::liveness-check : committed: 14096.38 txn/s, latency: 2443.06 ms, (p50: 2400 ms, p70: 2700, p90: 3300 ms, p99: 4200 ms), latency samples: 466640
2. Upgrading first Validator to new version: 9a47362134c69c025cb8acf16b8eb703e04594c7
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6722.75 txn/s, latency: 5025.74 ms, (p50: 5500 ms, p70: 5600, p90: 5700 ms, p99: 5800 ms), latency samples: 229620
3. Upgrading rest of first batch to new version: 9a47362134c69c025cb8acf16b8eb703e04594c7
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6842.73 txn/s, latency: 4925.73 ms, (p50: 5400 ms, p70: 5500, p90: 5600 ms, p99: 5800 ms), latency samples: 233560
4. upgrading second batch to new version: 9a47362134c69c025cb8acf16b8eb703e04594c7
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10825.14 txn/s, latency: 3105.51 ms, (p50: 3300 ms, p70: 3400, p90: 3500 ms, p99: 3700 ms), latency samples: 356980
5. check swarm health
Compatibility test for da70c48185fb85346c1e404ba45f2498824cbd22 ==> 9a47362134c69c025cb8acf16b8eb703e04594c7 passed
Test Ok

@github-actions
Copy link
Copy Markdown
Contributor

✅ Forge suite realistic_env_max_load success on 9a47362134c69c025cb8acf16b8eb703e04594c7

two traffics test: inner traffic : committed: 16027.59 txn/s, latency: 1074.61 ms, (p50: 1000 ms, p70: 1100, p90: 1200 ms, p99: 1600 ms), latency samples: 5987080
two traffics test : committed: 100.00 txn/s, latency: 802.17 ms, (p50: 700 ms, p70: 900, p90: 1000 ms, p99: 1200 ms), latency samples: 1700
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.291, avg: 0.257", "ConsensusProposalToOrdered: max: 0.117, avg: 0.110", "ConsensusOrderedToCommit: max: 0.200, avg: 0.173", "ConsensusProposalToCommit: max: 0.303, avg: 0.283"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.53s no progress at version 6566179 (avg 0.06s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.25s no progress at version 3092082 (avg 0.25s) [limit 16].
Test Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CICD:build-performance-images build performance docker image variants CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant