feat: add automatic DA retriever walkback when P2P stalls and DA blocks too far ahead#3262
feat: add automatic DA retriever walkback when P2P stalls and DA blocks too far ahead#3262julienrbrt wants to merge 11 commits into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @julienrbrt's task in 3m 22s —— View job Code Review
OverviewThis PR adds an automatic DA retriever walkback mechanism triggered when the P2P layer stalls and DA-sourced blocks are too far ahead of the node's current height. The main idea: if P2P is stalled and DA events reference block heights far above what the node has applied, the The author's self-note ("do not like what the LLM has produced") is fair — there are genuine design and correctness concerns below. Core Design Concern: One-height-at-a-time walkback is O(gap × 2s)
if needsWalkback {
s.walkbackActive.Store(true)
const walkbackCooldown = 2 * time.Second
s.walkbackCooldownUntil.Store(time.Now().Add(walkbackCooldown).UnixNano())
// ...
return daHeight - 1 // rewinds by exactly 1
}The walkback steps back one DA height per 2-second cooldown window. If the DA subscriber has advanced 500 heights ahead of the node (which is realistic after a sustained P2P outage), recovery takes at least 1000 seconds before enough earlier heights are re-fetched. A better approach would be to jump back directly to the DA height that corresponds to // Instead of daHeight - 1, jump to the approximate DA height matching nodeHeight
targetDAHeight := max(s.daRetrieverHeight.Load(), estimatedDAHeightForNode)
return targetDAHeightEven without
|
| Area | Finding |
|---|---|
| Design | One-at-a-time walkback is too slow for large gaps |
| Design | Walkback loops through empty DA heights on each cooldown tick |
| Correctness | walkbackCheck does not take a context, uses s.ctx directly |
| Testing | Missing test for cooldown-blocking behavior |
| Testing | MockSubscriberHandler.HandleCatchup is fragile (potential panic on nil) |
| Architecture | WalkbackChecker on Subscriber may be over-generalised |
The P2P stall detection logic (p2pStalled flag in p2pWorkerLoop) is clean and correctly distinguishes genuine stalls from DA-event-triggered cancellations. The RewindTo/signalCatchup plumbing in Subscriber is correct. The main practical concern is recovery speed for large gaps.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3262 +/- ##
==========================================
+ Coverage 60.85% 60.91% +0.06%
==========================================
Files 127 127
Lines 13762 13815 +53
==========================================
+ Hits 8375 8416 +41
- Misses 4476 4484 +8
- Partials 911 915 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overview
Add automatic DA retriever walkback when P2P stalls and DA blocks too far ahead
Early WIP, i do not like what the LLM has produced.