Skip to content

[consensus] Add metrics for self-opt-proposal observation#19479

Draft
danielxiangzl wants to merge 1 commit intomainfrom
daniel/self-opt-metrics
Draft

[consensus] Add metrics for self-opt-proposal observation#19479
danielxiangzl wants to merge 1 commit intomainfrom
daniel/self-opt-metrics

Conversation

@danielxiangzl
Copy link
Copy Markdown
Contributor

Summary

  • Add 3 counters to measure this validator's own opt-proposal behavior
  • Enables opt success rate query: rate(committed) / rate(spawned)
  • Enables per-batch-author opt ratio query, scoped to self-proposed blocks
  • No behavior change — phase 1 of an experiment to lower opt_qs_minimum_batch_age_usecs

Test plan

  • Lint passes (./scripts/rust_lint.sh)
  • Cargo check passes
  • Deploy on euwe6-1 and apne1-0 to verify metrics emit expected values

🤖 Generated with Claude Code

Add three counters to observe this validator's opt-proposal behavior:
- aptos_consensus_self_opt_proposal_spawned: opt proposals spawned as leader
- aptos_consensus_self_opt_proposal_committed: opt proposals committed
- quorum_store_self_proposed_batches_by_type_and_author: per-author batch
  type breakdown in blocks proposed by this validator

These let us measure opt success rate (committed / spawned) and per-author
opt ratio specifically in this validator's own proposals. No behavior change.

Phase 1 of a planned experiment to lower opt_qs_minimum_batch_age_usecs
from 50ms to 20ms, where we need baseline metrics before the config change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@danielxiangzl danielxiangzl added CICD:build-performance-images build performance docker image variants CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR labels Apr 17, 2026
danielxiangzl added a commit that referenced this pull request Apr 17, 2026
Experiment to increase opt batch inclusion rate, especially for EU-authored
batches where proof forms in 15-30ms and the current 50ms threshold makes
them almost always ineligible for opt path.

Data from euwe6-1:
- Batch age when pulled inline: p50=19ms, p90=39ms
- Current opt pull starts at 50ms - many eligible batches are skipped
- Intra-EU one-way network ~5ms; 20ms gives safe margin for dissemination

Expected effects:
- BATCH_SKIPPED_TOO_YOUNG rate drops
- Per-author opt batch ratio increases (especially EU authors)
- Opt proposal success rate should stay similar (monitor via Phase 1 metrics)

Based on daniel/self-opt-metrics (PR #19479) which adds the observation
metrics. Deploy on euwe6-1 only for Phase 2 canary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Forge suite realistic_env_max_load success on cb44bcc8ee6b7417b5d061994d1b6b2e1c765c46

two traffics test: inner traffic : committed: 15792.97 txn/s, latency: 1110.63 ms, (p50: 1000 ms, p70: 1100, p90: 1300 ms, p99: 1700 ms), latency samples: 5898380
two traffics test : committed: 99.99 txn/s, latency: 767.17 ms, (p50: 700 ms, p70: 800, p90: 900 ms, p99: 1200 ms), latency samples: 1720
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.359, avg: 0.324", "ConsensusProposalToOrdered: max: 0.130, avg: 0.125", "ConsensusOrderedToCommit: max: 0.197, avg: 0.176", "ConsensusProposalToCommit: max: 0.318, avg: 0.300"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.49s no progress at version 6277575 (avg 0.06s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.31s no progress at version 2348205 (avg 0.31s) [limit 16].
Test Ok

@github-actions
Copy link
Copy Markdown
Contributor

✅ Forge suite compat success on da70c48185fb85346c1e404ba45f2498824cbd22 ==> cb44bcc8ee6b7417b5d061994d1b6b2e1c765c46

Compatibility test results for da70c48185fb85346c1e404ba45f2498824cbd22 ==> cb44bcc8ee6b7417b5d061994d1b6b2e1c765c46 (PR)
1. Check liveness of validators at old version: da70c48185fb85346c1e404ba45f2498824cbd22
compatibility::simple-validator-upgrade::liveness-check : committed: 14216.78 txn/s, latency: 2415.32 ms, (p50: 2400 ms, p70: 2700, p90: 3100 ms, p99: 4500 ms), latency samples: 468620
2. Upgrading first Validator to new version: cb44bcc8ee6b7417b5d061994d1b6b2e1c765c46
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6332.41 txn/s, latency: 5307.06 ms, (p50: 5800 ms, p70: 5900, p90: 6100 ms, p99: 6200 ms), latency samples: 219340
3. Upgrading rest of first batch to new version: cb44bcc8ee6b7417b5d061994d1b6b2e1c765c46
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6234.38 txn/s, latency: 5436.65 ms, (p50: 6000 ms, p70: 6100, p90: 6200 ms, p99: 6400 ms), latency samples: 215500
4. upgrading second batch to new version: cb44bcc8ee6b7417b5d061994d1b6b2e1c765c46
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10886.79 txn/s, latency: 2951.64 ms, (p50: 3100 ms, p70: 3200, p90: 3400 ms, p99: 3700 ms), latency samples: 362620
5. check swarm health
Compatibility test for da70c48185fb85346c1e404ba45f2498824cbd22 ==> cb44bcc8ee6b7417b5d061994d1b6b2e1c765c46 passed
Test Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CICD:build-performance-images build performance docker image variants CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant