Skip to content
Closed
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
7e59d71
deterministic state sync
marcello33 Mar 25, 2026
fdb0082
temp bump heimdall to committed version for testing purposes
marcello33 Mar 26, 2026
3ae74d8
fix parsing
marcello33 Mar 26, 2026
405e5e6
fix unmarshalling of RecordListVisibleAtHeightResponse
marcello33 Mar 26, 2026
1893557
change heimdall dep for testing
marcello33 Mar 26, 2026
f48218f
Merge branch 'develop' into mardizzone/POS-3441_deterministic-ss
marcello33 Mar 27, 2026
f3446c4
add DeterministicStateSyncBlock to GatherForks
marcello33 Mar 27, 2026
2874ed1
Merge branch 'develop' into mardizzone/POS-3441_deterministic-ss
marcello33 Mar 27, 2026
2934d3e
Merge branch 'develop' into mardizzone/POS-3441_deterministic-ss
marcello33 Mar 31, 2026
47f48bc
better comment on go.mod
marcello33 Mar 31, 2026
13a51b8
fix linter
marcello33 Mar 31, 2026
96d8797
address comments
marcello33 Apr 1, 2026
c6f0d55
address comments
marcello33 Apr 1, 2026
a94f1a9
remove omitempty
marcello33 Apr 1, 2026
46e1750
update banner
marcello33 Apr 1, 2026
653d8b6
timeout for StateSyncEventsAtHeight
marcello33 Apr 1, 2026
bdb52b0
address comments
marcello33 Apr 1, 2026
cd46ca1
address comments
marcello33 Apr 1, 2026
d508db4
added tests
marcello33 Apr 1, 2026
360e533
address minor err shadowing and fix lint
marcello33 Apr 1, 2026
c1c1e3d
test single endpoint
marcello33 Apr 3, 2026
9697c13
update heimdall-v2 dependency to DSS-test branch
marcello33 Apr 3, 2026
925c5e5
address comments
marcello33 Apr 4, 2026
5dfe949
solve lint issue
marcello33 Apr 4, 2026
8d06151
address comments
marcello33 Apr 4, 2026
3a98421
address comments
marcello33 Apr 4, 2026
a1fa6b9
address comments
marcello33 Apr 4, 2026
dce6483
address comments
marcello33 Apr 4, 2026
ca9ee85
unify endpoints
marcello33 Apr 6, 2026
85521b2
address comments
marcello33 Apr 6, 2026
4326272
remove dead code
marcello33 Apr 6, 2026
5c6214f
Merge branch 'develop' into mardizzone/POS-3441_deterministic-ss
marcello33 Apr 7, 2026
085fdd5
internal/ethapi: add MaxUsedGas field to eth_simulateV1 response (#32…
Rhovian Mar 4, 2026
e6bc0e6
Merge branch 'develop' into mardizzone/POS-3441_deterministic-ss
marcello33 May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 25 additions & 5 deletions consensus/bor/bor.go
Original file line number Diff line number Diff line change
Expand Up @@ -1760,12 +1760,32 @@ func (c *Bor) CommitStates(
// Wait for heimdall to be synced before fetching state sync events
c.spanStore.waitUntilHeimdallIsSynced(c.ctx)

eventRecords, err = c.HeimdallClient.StateSyncEvents(c.ctx, from, to.Unix())
if err != nil {
log.Error("Error occurred when fetching state sync events", "fromID", from, "to", to.Unix(), "err", err)
if c.config.IsDeterministicStateSync(header.Number) {
log.Info("Using deterministic state sync", "cutoff", to.Unix())

eventRecords, err = c.HeimdallClient.StateSyncEventsByTime(c.ctx, from, to.Unix())
if err != nil {
// Liveness-over-safety tradeoff (matches pre-fork behavior):
// FetchWithRetry already retries aggressively and MultiHeimdallClient
// provides failover, so errors reaching here are persistent. Returning
// empty lets the proposer build a block with 0 state syncs. If other
// validators succeed, they will derive a different state root and reject
// this block — the proposer misses a slot but no silent divergence
// occurs. Skipped events are retried at the next sprint since `from`
// is derived from the on-chain LastStateId.
log.Error("Error fetching deterministic state sync events", "fromID", from, "to", to.Unix(), "err", err)

stateSyncs := make([]*types.StateSyncData, 0)
return stateSyncs, nil
return make([]*types.StateSyncData, 0), nil
Comment on lines +1772 to +1787
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-fork deterministic mode currently logs and returns an empty state-sync set when StateSyncEventsByTime errors. That outcome depends on each node’s local Heimdall connectivity; if some validators succeed and others hit a transient error, they will apply different state updates and compute different state roots for the same block. For consensus-critical finalization, this should fail block processing (return the error) or use a deterministic fallback that guarantees all nodes derive the same result (e.g., only treat specific, globally-observable errors as skippable).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-fork path has the same behavior (log + return empty). I'll probably keep this comment open and discuss with the team.
Anyway, the risk is mitigated by multiple factors:

  1. FetchWithRetry retries aggressively
  2. waitUntilHeimdallIsSynced gates CommitStates until heimdall is caught up
  3. CommitStates only runs at sprint boundaries

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's comment does makes sense in the worst case scenario - if a BP tries to fetch state sync when there is state sync but fail, its block would be rejected, causing span rotation. @claude could you take a look and see all the possible errors that could be returned from StateSyncEventsByTime and StateSyncEvents?

Copy link
Copy Markdown

@claude claude Bot May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude finished @cffls's task in 1m 36s —— View job


Error Analysis: StateSyncEventsByTime vs StateSyncEvents

  • Gather context and read relevant source files
  • Trace all error paths in StateSyncEventsByTime (HTTP, gRPC, app, failover)
  • Trace all error paths in StateSyncEvents (HTTP, gRPC, app, failover)
  • Compare error handling between pre-fork and post-fork paths in CommitStates
  • Post final analysis

Both the pre-fork (StateSyncEvents) and post-fork (StateSyncEventsByTime) code paths handle errors identically in CommitStateslog the error and return an empty state sync set with nil error (bor.go:1776-1778 and bor.go:1782-1787). So the concern Jerry raises (BP builds a block with 0 state syncs, other validators reject it due to state root mismatch, causing span rotation) applies equally to both paths. Here are all the possible errors:


Errors from StateSyncEventsByTime (post-fork)

HTTP transport (client.go:275-324) — uses FetchWithRetry per page, with a 1-minute global timeout:

Error Source Retried?
url.Parse failure stateSyncsByTimeURL — malformed urlString No (immediate)
context.DeadlineExceeded 1-minute global timeout expires during pagination No
context.DeadlineExceeded Per-request 30s timeout in internalFetchWithTimeout (only surfaces if retries also time out) Yes (by FetchWithRetry)
context.Canceled Parent c.ctx cancelled (node shutdown) No
ErrShutdownDetected closeCh closed during retry loop No (immediate return)
ErrServiceUnavailable Heimdall returns HTTP 503 (endpoint not activated yet) No (immediate return)
HTTPStatusError (wraps ErrNotSuccessfulResponse) Any non-2xx, non-503 HTTP status (e.g. 500, 502, 504) Yes (by FetchWithRetry)
ErrNoResponse 204 No Content → nil body → Fetch returns ErrNoResponse Yes
net.Error (connection refused, DNS, TCP timeout) Network-level transport failure Yes
json.Unmarshal / cdc.UnmarshalJSON error Malformed response body from Heimdall Yes
http.NewRequestWithContext error Should not happen in practice Yes
io.ReadAll error from MaxBytesReader Response body exceeds 128 MB limit Yes

gRPC transport (heimdallgrpc/state_sync.go:91-152) — per-page timeout + 1-minute global timeout:

Error Source Retried?
context.DeadlineExceeded 1-minute stateSyncTotalTimeout expires No
context.DeadlineExceeded Per-page defaultTimeout (5s) expires No (breaks loop)
gRPC status errors (Unavailable, Internal, etc.) Heimdall gRPC server returns error No
context.Canceled Parent context cancelled No

In-process app (heimdallapp/state_sync.go:43-74) — direct keeper calls, no timeouts:

Error Source Retried?
Keeper/DB errors queryServer.GetStateSyncsByTime fails (store corruption, missing data) No

Failover wrapper (failover_client.go:113-121) — adds 1-minute timeout before callWithFailover:

Error Source Retried?
All transport errors above Passed through from underlying transport Cascaded to next endpoint if failover-eligible
context.DeadlineExceeded 30s attemptTimeout per endpoint in callWithFailover Cascaded (tries next endpoint)
Last endpoint's error All endpoints exhausted No — returned as final error

Failover-eligible errors (triggers cascade): net.Error, ErrNoResponse, HTTPStatusError with status ≥ 500, context.DeadlineExceeded, context.Canceled (when parent ctx is still alive).

Not failover-eligible (returned immediately): ErrServiceUnavailable (503), ErrShutdownDetected, parent ctx cancelled.


Errors from StateSyncEvents (pre-fork)

HTTP transport (client.go:102-153) — uses bare Fetch per page (no retry), no global timeout:

Error Source Retried?
url.Parse failure stateSyncURL No
context.DeadlineExceeded Per-request 30s timeout in internalFetchWithTimeout No (bare Fetch, no retry)
context.Canceled Parent context cancelled No
ErrServiceUnavailable HTTP 503 No
HTTPStatusError Non-2xx, non-503 status No
ErrNoResponse 204 → nil body No
net.Error Network failure No
JSON/proto unmarshal error Malformed body No
io.ReadAll error Body exceeds 128 MB No

gRPC transport (heimdallgrpc/state_sync.go:22-87) — per-page timeout + 1-minute global:

Same error set as StateSyncEventsByTime gRPC (identical structure).

In-process app (heimdallapp/state_sync.go:15-39):

Error Source
ClerkKeeper.GetEventRecord failure fromRecord lookup fails
ClerkKeeper.GetEventRecordListWithTime failure Event list query fails

Failover wrapper (failover_client.go:107-111) — no pre-set timeout (unlike StateSyncEventsByTime):

Same cascade logic as above, but the parent context has no 1-minute cap, so the 30s attemptTimeout is the only bound per endpoint.


Key difference: StateSyncEventsByTime is more resilient than StateSyncEvents

Aspect StateSyncEvents (pre-fork) StateSyncEventsByTime (post-fork)
HTTP retry None — bare Fetch FetchWithRetry — retries every 5s until timeout
Global timeout None (HTTP), 1 min (gRPC) 1 min (all transports)
Failover pre-timeout None 1-minute context.WithTimeout set before callWithFailover

So transient errors (brief 5xx, connection reset) are more likely to surface from StateSyncEvents than from StateSyncEventsByTime.


The concern Jerry raises

The worry: if a BP's Heimdall call fails, it builds a block with 0 state syncs. If other validators' Heimdall calls succeed, they derive a different state root and reject the block → the BP misses a slot → potential span rotation.

This is a real scenario that applies to both pre-fork and post-fork paths, since both use the same log.Error + return empty, nil pattern. The code comments at bor.go:1768-1775 document this as an intentional liveness-over-safety tradeoff: the proposer misses a slot rather than crashing entirely, and FetchWithRetry + MultiHeimdallClient failover make persistent failures unlikely.

The risk is mitigated by:

  1. waitUntilHeimdallIsSynced gates entry — Heimdall must be responsive
  2. FetchWithRetry retries aggressively (new in post-fork path)
  3. MultiHeimdallClient cascades across multiple endpoints
  4. CommitStates only runs at sprint boundaries (every 16 blocks)
  5. Skipped events are retried next sprint (from is derived from on-chain LastStateId)

If the team wants to eliminate the slot-miss risk entirely, the alternative would be to return the error from CommitStates (which would cause the block production attempt to fail and be retried), but that risks an infinite retry loop if Heimdall is persistently down. The current design accepts a missed slot as the lesser evil.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT @cffls
I left this open intentionally - I'm down to patch this, just wondering if we have to - as it would be drifting from the current pre-fork behavior.
Thanks

}
Comment thread
marcello33 marked this conversation as resolved.
Comment thread
marcello33 marked this conversation as resolved.
Comment thread
marcello33 marked this conversation as resolved.
Comment thread
marcello33 marked this conversation as resolved.
} else {
eventRecords, err = c.HeimdallClient.StateSyncEvents(c.ctx, from, to.Unix())
if err != nil {
// Pre-fork: preserve existing behavior (returning empty, no error)
log.Error("Error occurred when fetching state sync events", "fromID", from, "to", to.Unix(), "err", err)

stateSyncs := make([]*types.StateSyncData, 0)
return stateSyncs, nil
}
}

// This if statement checks if there are any state sync record overrides configured for the current block number.
Expand Down
197 changes: 197 additions & 0 deletions consensus/bor/bor_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,15 @@
func (f *failingHeimdallClient) FetchStatus(ctx context.Context) (*ctypes.SyncInfo, error) {
return nil, errors.New("fetch status failed")
}
func (f *failingHeimdallClient) GetBlockHeightByTime(_ context.Context, _ int64) (int64, error) {
return 0, errors.New("get block height by time failed")
}
func (f *failingHeimdallClient) StateSyncEventsAtHeight(_ context.Context, _ uint64, _ int64, _ int64) ([]*clerk.EventRecordWithTime, error) {
return nil, errors.New("state sync events at height failed")
}
func (f *failingHeimdallClient) StateSyncEventsByTime(_ context.Context, _ uint64, _ int64) ([]*clerk.EventRecordWithTime, error) {
return nil, errors.New("state sync events by time failed")
}

// newStateDBForTest creates a fresh state database for testing.
func newStateDBForTest(t *testing.T, root common.Hash) *state.StateDB {
Expand Down Expand Up @@ -2974,6 +2983,15 @@
func (m *mockHeimdallClient) FetchStatus(ctx context.Context) (*ctypes.SyncInfo, error) {
return &ctypes.SyncInfo{CatchingUp: false}, nil
}
func (m *mockHeimdallClient) GetBlockHeightByTime(_ context.Context, _ int64) (int64, error) {
return 0, nil
}
func (m *mockHeimdallClient) StateSyncEventsAtHeight(_ context.Context, _ uint64, _ int64, _ int64) ([]*clerk.EventRecordWithTime, error) {
return nil, nil
}
func (m *mockHeimdallClient) StateSyncEventsByTime(_ context.Context, _ uint64, _ int64) ([]*clerk.EventRecordWithTime, error) {
return m.events, nil
}
func TestEncodeSigHeader_WithBaseFee(t *testing.T) {
t.Parallel()
h := &types.Header{
Expand Down Expand Up @@ -5259,3 +5277,182 @@
require.NotErrorIs(t, err, errMissingGiuglianoFields)
}
}

// trackingHeimdallClient records which IHeimdallClient methods were called.
// It returns configurable results and tracks call counts for assertions.
type trackingHeimdallClient struct {
// Call counters
stateSyncEventsCalled int
stateSyncEventsByTimeCalled int

// Configurable return values
events []*clerk.EventRecordWithTime
eventsErr error
eventsByTime []*clerk.EventRecordWithTime
eventsByTimeErr error
}

func (t *trackingHeimdallClient) Close() {}

Check failure on line 5295 in consensus/bor/bor_test.go

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Add a nested comment explaining why this function is empty or complete the implementation.

See more on https://sonarcloud.io/project/issues?id=0xPolygon_bor&issues=AZ1FPy8poC2Lq8aFv3Hb&open=AZ1FPy8poC2Lq8aFv3Hb&pullRequest=2177
func (t *trackingHeimdallClient) StateSyncEvents(context.Context, uint64, int64) ([]*clerk.EventRecordWithTime, error) {
t.stateSyncEventsCalled++
return t.events, t.eventsErr
}
func (t *trackingHeimdallClient) GetSpan(context.Context, uint64) (*borTypes.Span, error) {
return nil, nil
}
func (t *trackingHeimdallClient) GetLatestSpan(context.Context) (*borTypes.Span, error) {
return nil, nil
}
func (t *trackingHeimdallClient) FetchCheckpoint(context.Context, int64) (*checkpoint.Checkpoint, error) {
return nil, nil
}
func (t *trackingHeimdallClient) FetchCheckpointCount(context.Context) (int64, error) {
return 0, nil
}
func (t *trackingHeimdallClient) FetchMilestone(context.Context) (*milestone.Milestone, error) {
return nil, nil
}
func (t *trackingHeimdallClient) FetchMilestoneCount(context.Context) (int64, error) {
return 0, nil
}
func (t *trackingHeimdallClient) FetchStatus(context.Context) (*ctypes.SyncInfo, error) {
return &ctypes.SyncInfo{CatchingUp: false}, nil
}
func (t *trackingHeimdallClient) StateSyncEventsByTime(context.Context, uint64, int64) ([]*clerk.EventRecordWithTime, error) {
t.stateSyncEventsByTimeCalled++
return t.eventsByTime, t.eventsByTimeErr
}

// deterministicBorConfig returns a BorConfig with DeterministicStateSyncBlock set.
func deterministicBorConfig(forkBlock int64) *params.BorConfig {
return &params.BorConfig{
Sprint: map[string]uint64{"0": 16},
Period: map[string]uint64{"0": 2},
IndoreBlock: big.NewInt(0),
StateSyncConfirmationDelay: map[string]uint64{"0": 0},
RioBlock: big.NewInt(1000000),
DeterministicStateSyncBlock: big.NewInt(forkBlock),
}
}

func TestCommitStates_DeterministicForkSwitch(t *testing.T) {
t.Parallel()

addr1 := common.HexToAddress("0x1")
sp := &fakeSpanner{vals: []*valset.Validator{{Address: addr1, VotingPower: 1}}}
mockGC := &mockGenesisContractForCommitStatesIndore{lastStateID: 0}

// Fork activates at block 100
borCfg := deterministicBorConfig(100)
genesisTime := uint64(time.Now().Unix()) - 200
chain, b := newChainAndBorForTest(t, sp, borCfg, true, addr1, genesisTime)
b.GenesisContractsClient = mockGC

genesis := chain.HeaderChain().GetHeaderByNumber(0)
now := time.Now()

// Pre-fork: block 16 should use StateSyncEvents (old legacy path)
tracker := &trackingHeimdallClient{
events: []*clerk.EventRecordWithTime{},
}
b.SetHeimdallClient(tracker)

stateDb := newStateDBForTest(t, genesis.Root)
h := &types.Header{Number: big.NewInt(16), ParentHash: genesis.Hash(), Time: uint64(now.Unix())}

_, err := b.CommitStates(stateDb, h, statefull.ChainContext{Chain: chain.HeaderChain(), Bor: b})
require.NoError(t, err)
require.Equal(t, 1, tracker.stateSyncEventsCalled, "pre-fork should call StateSyncEvents")
require.Equal(t, 0, tracker.stateSyncEventsByTimeCalled, "pre-fork should not call StateSyncEventsByTime")

Comment thread
marcello33 marked this conversation as resolved.
// Post-fork: block 112 should use StateSyncEventsByTime (deterministic state sync)
tracker2 := &trackingHeimdallClient{
eventsByTime: []*clerk.EventRecordWithTime{},
}
b.SetHeimdallClient(tracker2)

stateDb2 := newStateDBForTest(t, genesis.Root)
h2 := &types.Header{Number: big.NewInt(112), ParentHash: genesis.Hash(), Time: uint64(now.Unix())}

_, err = b.CommitStates(stateDb2, h2, statefull.ChainContext{Chain: chain.HeaderChain(), Bor: b})
require.NoError(t, err)
require.Equal(t, 0, tracker2.stateSyncEventsCalled, "post-fork should not call StateSyncEvents")
require.Equal(t, 1, tracker2.stateSyncEventsByTimeCalled, "post-fork should call StateSyncEventsByTime")
}

func TestCommitStates_ResilientPostFork(t *testing.T) {
t.Parallel()

addr1 := common.HexToAddress("0x1")
sp := &fakeSpanner{vals: []*valset.Validator{{Address: addr1, VotingPower: 1}}}
mockGC := &mockGenesisContractForCommitStatesIndore{lastStateID: 0}

// Fork activates at block 0 so all blocks are post-fork
borCfg := deterministicBorConfig(0)
genesisTime := uint64(time.Now().Unix()) - 200
chain, b := newChainAndBorForTest(t, sp, borCfg, true, addr1, genesisTime)
b.GenesisContractsClient = mockGC

genesis := chain.HeaderChain().GetHeaderByNumber(0)
now := time.Now()

// StateSyncEventsByTime returns an error
tracker := &trackingHeimdallClient{
eventsByTimeErr: errors.New("heimdall state sync by time failed"),
}
b.SetHeimdallClient(tracker)

stateDb := newStateDBForTest(t, genesis.Root)
h := &types.Header{Number: big.NewInt(16), ParentHash: genesis.Hash(), Time: uint64(now.Unix())}

result, err := b.CommitStates(stateDb, h, statefull.ChainContext{Chain: chain.HeaderChain(), Bor: b})

// Post-fork errors are resilient: log + return empty, no error
require.NoError(t, err, "post-fork should not return error on StateSyncEventsByTime failure")
require.Empty(t, result, "post-fork should return empty on StateSyncEventsByTime failure")

// StateSyncEventsByTime should have been called
require.Equal(t, 1, tracker.stateSyncEventsByTimeCalled,
"StateSyncEventsByTime should have been called once")
// Must not fallback to StateSyncEvents
require.Equal(t, 0, tracker.stateSyncEventsCalled,
"post-fork should NOT fall back to StateSyncEvents on error")
}

func TestCommitStates_ResilientPostFork_ReturnsEmptyOnError(t *testing.T) {
t.Parallel()

addr1 := common.HexToAddress("0x1")
sp := &fakeSpanner{vals: []*valset.Validator{{Address: addr1, VotingPower: 1}}}
mockGC := &mockGenesisContractForCommitStatesIndore{lastStateID: 0}

borCfg := deterministicBorConfig(0)
genesisTime := uint64(time.Now().Unix()) - 200
chain, b := newChainAndBorForTest(t, sp, borCfg, true, addr1, genesisTime)
b.GenesisContractsClient = mockGC

genesis := chain.HeaderChain().GetHeaderByNumber(0)
now := time.Now()

// StateSyncEventsByTime fails with an HTTP error
tracker := &trackingHeimdallClient{
eventsByTimeErr: errors.New("HTTP 503: service unavailable"),
}
b.SetHeimdallClient(tracker)

stateDb := newStateDBForTest(t, genesis.Root)
h := &types.Header{Number: big.NewInt(16), ParentHash: genesis.Hash(), Time: uint64(now.Unix())}

result, err := b.CommitStates(stateDb, h, statefull.ChainContext{Chain: chain.HeaderChain(), Bor: b})

// Post-fork is resilient: returns empty on error, does not propagate
require.NoError(t, err, "post-fork should not return error on StateSyncEventsByTime failure")
require.Empty(t, result, "post-fork should return empty on StateSyncEventsByTime failure")

// StateSyncEventsByTime should have been called
require.Equal(t, 1, tracker.stateSyncEventsByTimeCalled,
"StateSyncEventsByTime should have been called")
// Old path should not have been called as fallback
require.Equal(t, 0, tracker.stateSyncEventsCalled,
"post-fork should not fall back to StateSyncEvents")
}
1 change: 1 addition & 0 deletions consensus/bor/heimdall.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
//go:generate mockgen -source=heimdall.go -destination=../../tests/bor/mocks/IHeimdallClient.go -package=mocks
type IHeimdallClient interface {
StateSyncEvents(ctx context.Context, fromID uint64, to int64) ([]*clerk.EventRecordWithTime, error)
StateSyncEventsByTime(ctx context.Context, fromID uint64, toTime int64) ([]*clerk.EventRecordWithTime, error)
GetSpan(ctx context.Context, spanID uint64) (*types.Span, error)
Comment thread
marcello33 marked this conversation as resolved.
GetLatestSpan(ctx context.Context) (*types.Span, error)
FetchCheckpoint(ctx context.Context, number int64) (*checkpoint.Checkpoint, error)
Expand Down
68 changes: 68 additions & 0 deletions consensus/bor/heimdall/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ const (
fetchLatestSpan = "bor/spans/latest"

fetchStatus = "/status"

fetchStateSyncsByTimePath = "clerk/state-syncs-by-time"
)

// StateSyncEvents fetches the state sync events from heimdall
Expand Down Expand Up @@ -265,6 +267,63 @@ func (h *HeimdallClient) FetchStatus(ctx context.Context) (*ctypes.SyncInfo, err
return response, nil
}

// StateSyncsByTimeResponse uses the proto-generated response type from heimdall-v2.
type StateSyncsByTimeResponse = clerkTypes.StateSyncsByTimeResponse

Comment on lines +270 to +272
// StateSyncEventsByTime fetches state sync events using the combined endpoint that
// resolves the Heimdall height from the cutoff time internally.
func (h *HeimdallClient) StateSyncEventsByTime(ctx context.Context, fromID uint64, toTime int64) ([]*clerk.EventRecordWithTime, error) {
// Global timeout bounding the entire paginated fetch, matching the gRPC
// implementation's stateSyncTotalTimeout (1 minute).
ctx, cancel := context.WithTimeout(ctx, 1*time.Minute)
defer cancel()

ctx = WithRequestType(ctx, StateSyncByTimeRequest)

eventRecords := make([]*clerk.EventRecordWithTime, 0)

for {
u, err := stateSyncsByTimeURL(h.urlString, fromID, toTime)
if err != nil {
return nil, err
}

log.Debug("Fetching state sync events by time", "queryParams", u.RawQuery)

response, err := FetchWithRetry[StateSyncsByTimeResponse](ctx, h.client, u, h.closeCh)
if err != nil {
return nil, err
}

for _, e := range response.EventRecords {
record := &clerk.EventRecordWithTime{
EventRecord: clerk.EventRecord{
ID: e.Id,
ChainID: e.BorChainId,
Comment thread
claude[bot] marked this conversation as resolved.
Contract: common.HexToAddress(e.Contract),
Data: e.Data,
LogIndex: e.LogIndex,
TxHash: common.HexToHash(e.TxHash),
},
Time: e.RecordTime,
}
Comment thread
marcello33 marked this conversation as resolved.
eventRecords = append(eventRecords, record)
}

if len(response.EventRecords) < stateFetchLimit {
break
}

fromID += uint64(stateFetchLimit)
}

sort.SliceStable(eventRecords, func(i, j int) bool {
return eventRecords[i].ID < eventRecords[j].ID
})

return eventRecords, nil
}

func FetchOnce[T any](ctx context.Context, client http.Client, url *url.URL, closeCh chan struct{}) (*T, error) {
request := &Request{client: client, url: url, start: time.Now()}
return Fetch[T](ctx, request)
Expand Down Expand Up @@ -437,6 +496,15 @@ func statusURL(urlString string) (*url.URL, error) {
return makeURL(urlString, fetchStatus, "")
}

func stateSyncsByTimeURL(urlString string, fromID uint64, toTime int64) (*url.URL, error) {
t := time.Unix(toTime, 0).UTC()
params := url.Values{}
params.Set("from_id", fmt.Sprintf("%d", fromID))
params.Set("to_time", t.Format(time.RFC3339Nano))
params.Set("pagination.limit", fmt.Sprintf("%d", stateFetchLimit))
return makeURL(urlString, fetchStateSyncsByTimePath, params.Encode())
}

func makeURL(urlString, rawPath, rawQuery string) (*url.URL, error) {
u, err := url.Parse(urlString)
if err != nil {
Expand Down
12 changes: 12 additions & 0 deletions consensus/bor/heimdall/failover_client.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ const (
// running into Go's covariant-slice restriction.
type Endpoint interface {
StateSyncEvents(ctx context.Context, fromID uint64, to int64) ([]*clerk.EventRecordWithTime, error)
StateSyncEventsByTime(ctx context.Context, fromID uint64, toTime int64) ([]*clerk.EventRecordWithTime, error)
GetSpan(ctx context.Context, spanID uint64) (*types.Span, error)
GetLatestSpan(ctx context.Context) (*types.Span, error)
FetchCheckpoint(ctx context.Context, number int64) (*checkpoint.Checkpoint, error)
Expand Down Expand Up @@ -109,6 +110,17 @@ func (f *MultiHeimdallClient) StateSyncEvents(ctx context.Context, fromID uint64
})
}

func (f *MultiHeimdallClient) StateSyncEventsByTime(ctx context.Context, fromID uint64, toTime int64) ([]*clerk.EventRecordWithTime, error) {
// Set a 1-minute global timeout for the paginated fetch BEFORE callWithFailover
// applies its per-attempt attemptTimeout cap.
ctx, cancel := context.WithTimeout(ctx, 1*time.Minute)
defer cancel()

return callWithFailover(f, ctx, func(ctx context.Context, c Endpoint) ([]*clerk.EventRecordWithTime, error) {
return c.StateSyncEventsByTime(ctx, fromID, toTime)
})
}
Comment thread
marcello33 marked this conversation as resolved.
Comment thread
marcello33 marked this conversation as resolved.
Comment thread
marcello33 marked this conversation as resolved.

func (f *MultiHeimdallClient) GetSpan(ctx context.Context, spanID uint64) (*types.Span, error) {
return callWithFailover(f, ctx, func(ctx context.Context, c Endpoint) (*types.Span, error) {
return c.GetSpan(ctx, spanID)
Comment thread
marcello33 marked this conversation as resolved.
Comment thread
marcello33 marked this conversation as resolved.
Expand Down
Loading
Loading