Fix problematic transaction handling#7183
Conversation
Coverage Report for CI Build 25801484257Coverage decreased (-0.06%) to 85.648%Details
Uncovered Changes
Coverage Regressions5464 previously-covered lines in 98 files lost coverage.
Coverage Stats
💛 - Coveralls |
There was a problem hiding this comment.
Based on my reading of the code, I think there's something problematic with this change (pun not intended but taken anyway), unless I've missed something.
When the miner selects from the mempool, it applies a 30s execution time budget per transaction:
When the signer's node validates a block proposal, it applies a 60s execution time budget for the whole block:
Three scenarios:
1) The miner is using its default setting of max 5 seconds per mempool sweep. In that case, it seems your budget-reset change seems to be no-op, because if we've blown past 30 seconds, we've certainly blown past 5.
2) The miner has a high enough max_miner_time_ms for the budget reset to matter, but it's below the signer's block time budget. In this case, I think we're fine (although a miner second might not be a signer second, so the line is blurry).
3) The max_miner_time_ms is even higher. Let's say it's 70 seconds. In that case, the miner might create a block with these transactions:
- Tx 1: 29 seconds
- Tx 2: 29 seconds
- Tx 3: 3 seconds
It would then propose the block to the signers, which would identify Tx 3 as problematic and, via #6964, the miner will drop Tx 3 even though it's the only nice transaction in the bunch.
If I'm right, we might still say that that's fine, I just want us to be explicit about it. If I'm wrong, at least I'll learn something 😅
| pub fn convert_clarity_error_to_transaction_result( | ||
| clarity_tx: &mut ClarityTx, | ||
| tx: &StacksTransaction, | ||
| cost_before: &ExecutionCost, | ||
| error: Error, |
There was a problem hiding this comment.
I really don't like the fact that a function called convert_x_to_y modifies global(-ish) execution state (namely the clarity_tx). Granted, you didn't add this problem, it already existed before your change.
What you did add though was a second way to pass in the cost_before to which to reset this state. In case of CostOverflowError, that information is part of the error; in your case, it's passed in as an argument to the function.
Neither of the two are particularly pretty (although I like yours a little better, under the premise that we rename the function to make clearer what it does), but we really shouldn't have both at the same time.
There was a problem hiding this comment.
Hmm, yeah this is a good point. Let me see if I can come up with a better refactoring.
There was a problem hiding this comment.
I renamed it to finalize_failed_transaction in ffc2334
The 5 second mempool sweep is only checked in between transactions, while the 30 second limit is checked during execution, so it can interrupt the processing of a single transaction.
This is a kind of implicit dance between the miner and signers. The wall-clock time limit is not a part of consensus, though it should be approximately enforced by the cost limits. A miner could potentially mine a valid block that takes 10 minutes to process if there was an issue in the cost tracking, but the signers would reject it because of their own wall-clock time limits. I think this is reasonable.
A transaction should only be marked as problematic if the time for that individual transaction exceeds the limit, not the total time spent on the block. The path for feedback from the signer might be different. I'll take a closer look at that again now. |
|
Yeah, so you're right. The signer will mark the transaction as problematic if it just happened to be the one executing while the deadline was reached. I'm thinking we should change that signer behavior, so that a transaction is only marked problematic if it alone exceeds a timeout. |
|
|
|
I changed the signer behavior (and fixed the integration test) in f928a96. |
This branch does two things:
When a transaction hits the execution time limit, mark it as problematic
When a transaction is problematic, revert the cost accumulation from that transaction