Skip to content

Fix problematic transaction handling#7183

Open
brice-stacks wants to merge 5 commits into
stacks-network:developfrom
brice-stacks:fix/problematic-txs
Open

Fix problematic transaction handling#7183
brice-stacks wants to merge 5 commits into
stacks-network:developfrom
brice-stacks:fix/problematic-txs

Conversation

@brice-stacks
Copy link
Copy Markdown
Contributor

This branch does two things:

When a transaction hits the execution time limit, mark it as problematic
When a transaction is problematic, revert the cost accumulation from that transaction

@coveralls
Copy link
Copy Markdown

coveralls commented May 4, 2026

Coverage Report for CI Build 25801484257

Coverage decreased (-0.06%) to 85.648%

Details

  • Coverage decreased (-0.06%) from the base build.
  • Patch coverage: 20 uncovered changes across 4 files (110 of 130 lines covered, 84.62%).
  • 5464 coverage regressions across 98 files.

Uncovered Changes

File Changed Covered %
stackslib/src/chainstate/stacks/miner.rs 14 7 50.0%
stackslib/src/net/api/postblock_proposal.rs 14 7 50.0%
clarity/src/vm/clarity.rs 5 1 20.0%
stackslib/src/chainstate/stacks/mod.rs 3 1 33.33%

Coverage Regressions

5464 previously-covered lines in 98 files lost coverage.

Top 10 Files by Coverage Loss Lines Losing Coverage Coverage
stackslib/src/config/mod.rs 358 69.09%
stackslib/src/chainstate/stacks/index/storage.rs 262 80.55%
stackslib/src/chainstate/stacks/miner.rs 238 83.06%
stackslib/src/net/inv/epoch2x.rs 221 79.44%
stackslib/src/chainstate/stacks/db/transactions.rs 203 97.16%
stackslib/src/net/chat.rs 201 93.01%
stackslib/src/chainstate/stacks/db/mod.rs 196 86.23%
stackslib/src/chainstate/stacks/index/trie_sql.rs 190 69.7%
stackslib/src/core/mempool.rs 170 86.87%
stackslib/src/chainstate/stacks/index/node.rs 161 87.28%

Coverage Stats

Coverage Status
Relevant Lines: 219809
Covered Lines: 188263
Line Coverage: 85.65%
Coverage Strength: 18940648.33 hits per line

💛 - Coveralls

Copy link
Copy Markdown
Contributor

@benjamin-stacks benjamin-stacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my reading of the code, I think there's something problematic with this change (pun not intended but taken anyway), unless I've missed something.

When the miner selects from the mempool, it applies a 30s execution time budget per transaction:

settings.max_execution_time,

When the signer's node validates a block proposal, it applies a 60s execution time budget for the whole block:

let block_deadline = Instant::now() + Duration::from_secs(timeout_secs);

Three scenarios:

1) The miner is using its default setting of max 5 seconds per mempool sweep. In that case, it seems your budget-reset change seems to be no-op, because if we've blown past 30 seconds, we've certainly blown past 5.

2) The miner has a high enough max_miner_time_ms for the budget reset to matter, but it's below the signer's block time budget. In this case, I think we're fine (although a miner second might not be a signer second, so the line is blurry).

3) The max_miner_time_ms is even higher. Let's say it's 70 seconds. In that case, the miner might create a block with these transactions:

  • Tx 1: 29 seconds
  • Tx 2: 29 seconds
  • Tx 3: 3 seconds

It would then propose the block to the signers, which would identify Tx 3 as problematic and, via #6964, the miner will drop Tx 3 even though it's the only nice transaction in the bunch.


If I'm right, we might still say that that's fine, I just want us to be explicit about it. If I'm wrong, at least I'll learn something 😅

Comment on lines 452 to 456
pub fn convert_clarity_error_to_transaction_result(
clarity_tx: &mut ClarityTx,
tx: &StacksTransaction,
cost_before: &ExecutionCost,
error: Error,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the fact that a function called convert_x_to_y modifies global(-ish) execution state (namely the clarity_tx). Granted, you didn't add this problem, it already existed before your change.

What you did add though was a second way to pass in the cost_before to which to reset this state. In case of CostOverflowError, that information is part of the error; in your case, it's passed in as an argument to the function.

Neither of the two are particularly pretty (although I like yours a little better, under the premise that we rename the function to make clearer what it does), but we really shouldn't have both at the same time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah this is a good point. Let me see if I can come up with a better refactoring.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to finalize_failed_transaction in ffc2334

@brice-stacks
Copy link
Copy Markdown
Contributor Author

  1. The miner is using its default setting of max 5 seconds per mempool sweep. In that case, it seems your budget-reset change seems to be no-op, because if we've blown past 30 seconds, we've certainly blown past 5.

The 5 second mempool sweep is only checked in between transactions, while the 30 second limit is checked during execution, so it can interrupt the processing of a single transaction.

  1. The miner has a high enough max_miner_time_ms for the budget reset to matter, but it's below the signer's block time budget. In this case, I think we're fine (although a miner second might not be a signer second, so the line is blurry).

This is a kind of implicit dance between the miner and signers. The wall-clock time limit is not a part of consensus, though it should be approximately enforced by the cost limits. A miner could potentially mine a valid block that takes 10 minutes to process if there was an issue in the cost tracking, but the signers would reject it because of their own wall-clock time limits. I think this is reasonable.

  1. The max_miner_time_ms is even higher. Let's say it's 70 seconds. In that case, the miner might create a block with these transactions:
  • Tx 1: 29 seconds
  • Tx 2: 29 seconds
  • Tx 3: 3 seconds

It would then propose the block to the signers, which would identify Tx 3 as problematic and, via #6964, the miner will drop Tx 3 even though it's the only nice transaction in the bunch.

A transaction should only be marked as problematic if the time for that individual transaction exceeds the limit, not the total time spent on the block. The path for feedback from the signer might be different. I'll take a closer look at that again now.

@brice-stacks
Copy link
Copy Markdown
Contributor Author

Yeah, so you're right. The signer will mark the transaction as problematic if it just happened to be the one executing while the deadline was reached. I'm thinking we should change that signer behavior, so that a transaction is only marked problematic if it alone exceeds a timeout.

@brice-stacks
Copy link
Copy Markdown
Contributor Author

miner_rejection_by_contract_publish_execution_time_expired is a legit failure, not flakiness. I'll update this test as needed.

@brice-stacks
Copy link
Copy Markdown
Contributor Author

I changed the signer behavior (and fixed the integration test) in f928a96.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants