fix(recent-block-cache): Reduce fetch fan out and preserve negative cache entries. by metalurgical · Pull Request #4421 · cowprotocol/services

metalurgical · 2026-05-17T11:27:43Z

Description

Reduce fetch fan out and preserve negative cache entries.

Changes

Fetch cache miss chunks through fetch_values(keys, block) directly instead of expanding them into individual key requests
Preserve negative cache entries instead of removing empty entries
Add test for negative cache entries

How to test

cargo nextest run

…che entries Use fetch_values instead of join_all to reduce fan out. Preserve negative cache entries instead of removing empty entries.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request refactors the RecentBlockCache by removing the request_sharing dependency and simplifying the fetching logic into a single retry loop within fetch_inner_many. It also corrects a bug in the garbage collection process where negative cache entries (empty results) were being prematurely evicted, and adds a regression test to verify their persistence. Review feedback identified that the new retry implementation introduces unnecessary latency on the final failure, loses original error context by using anyhow::bail!, and violates repository guidelines regarding detailed error reporting for batch fetches.

gemini-code-assist · 2026-05-17T11:33:40Z

+        for _ in 0..=self.maximum_retries {
+            match self.fetcher.fetch_values(keys.clone(), block).await {
+                Ok(values) => return Ok(values),
+                Err(err) => tracing::warn!("retrying fetch because error: {:?}", err),
            }
-            .boxed()
-        });
-        shared.await.context("could not fetch liquidity")
+            tokio::time::sleep(self.delay_between_retries).await;
+        }
+        anyhow::bail!("could not fetch liquidity");


The current retry logic in fetch_inner_many has several issues:

Unnecessary Latency: It performs a tokio::time::sleep even after the final retry attempt fails, adding redundant delay to the overall request time.

Error Context Loss: By using anyhow::bail!, the original RPC error (e.g., timeout, node error) is discarded. Using .context() preserves the root cause for better troubleshooting.

Batch Error Detail: According to repository guidelines for batch fetches, the implementation should explicitly indicate which items failed and provide error details for each, rather than returning a single error for the entire batch. Please ensure the response structure reflects individual item statuses.

let mut keys = keys; for i in 0..=self.maximum_retries { let current_keys = if i == self.maximum_retries { keys } else { keys.clone() }; match self.fetcher.fetch_values(current_keys, block).await { Ok(values) => return Ok(values), Err(err) => { if i == self.maximum_retries { return Err(anyhow::Error::from(err).context("could not fetch liquidity")); } tracing::warn!(?err, "retrying fetch"); tokio::time::sleep(self.delay_between_retries).await; } } } unreachable!()

References

When fetching a batch of items where individual fetches can fail, do not silently ignore errors. The API response should explicitly indicate which items failed and provide error details for each failure.

Point 1 and 2 have been corrected.

Point 3 is a trait redesign, Result<Vec<V>> will need to become HashMap<K, Result<Vec<V>> in multiple places for this.

gemini-code-assist

Code Review

This pull request refactors the RecentBlockCache to implement batch fetching and simplified retry logic, removing the dependency on individual request sharing. It also fixes a bug in the cache's garbage collection logic that caused negative cache entries to be prematurely evicted and adds a regression test. A high-severity recommendation was made to include an early return for empty key sets in fetch_inner_many to avoid unnecessary RPC calls.

jmg-duarte · 2026-06-09T07:16:57Z

-        let retries = self.maximum_retries;
-        let delay = self.delay_between_retries;
-        let fetcher = self.fetcher.clone();
-        let shared = self.requests.shared_or_else((key, block), |entry| {


I don't understand why you're removing the request sharing, it's also not covered in the description

Previously it was doing individual fetches for each key, each going through request sharing. Now it does a single fetch for the key set. Request sharing no longer appears necessary here. Do you agree or do you see another reason to keep it?

MartinquaXD · 2026-06-10T06:52:47Z

-                *values = vec![];
+        for ((block, key), values) in self.entries.iter().rev() {
+            if !cached_keys.insert((*key).clone()) {
+                entries_to_remove.push((*block, (*key).clone()));


What's the motivation for this change? I'm not seeing how the other fan-out change makes this necessary. Are those 2 unrelated changes bundled together?

Yes, two different changes in the same file.

MartinquaXD · 2026-06-10T06:55:17Z


    async fn fetch_inner_many(&self, keys: HashSet<K>, block: Block) -> Result<Vec<V>> {
-        let fetched =
-            futures::future::join_all(keys.iter().map(|key| self.fetch_inner(key.clone(), block)))


From the PR description it's not really clear to me what the desired outcome of this change is. Which metric would you expect to see improved based on this change so that we can verify somehow that it works as desired.
I imagine due to doing 1 bulk request with n items instead of n individual requests there should be a lot less congestion on the mutex around the cache so lower latency which reduces latency. Is that the idea for this PR?

I wonder if this could backfire since we are now doing a TON of synchronous work before yielding the thread back to the runtime.

The metric that will improve here is the call count and the related overhead per call, especially on a cold cache.

Your concern at the end applies equally to the old code as well as this change, not a regression here.

fix(recent-block-cache): Use batched fetches and preserve negative ca…

7e8d755

…che entries Use fetch_values instead of join_all to reduce fan out. Preserve negative cache entries instead of removing empty entries.

metalurgical requested a review from a team as a code owner May 17, 2026 11:27

claude Bot reviewed May 17, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

metalurgical marked this pull request as draft May 17, 2026 11:46

update: limit last retry and return error context

713ca9a

metalurgical marked this pull request as ready for review May 17, 2026 12:20

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

Comment thread crates/liquidity-sources/src/recent_block_cache.rs

metalurgical added 8 commits May 17, 2026 14:25

update: prevent fetching if keys are empty

071fafa

Merge branch 'main' into block_cache_fix

450f944

Merge branch 'main' into block_cache_fix

5da5a08

Merge branch 'main' into block_cache_fix

3e3fac3

Merge branch 'main' into block_cache_fix

63c5d49

Merge branch 'main' into block_cache_fix

b453a9c

Merge branch 'main' into block_cache_fix

c96f987

Merge branch 'main' into block_cache_fix

8834101

jmg-duarte reviewed Jun 9, 2026

View reviewed changes

Merge branch 'main' into block_cache_fix

e238b8e

MartinquaXD reviewed Jun 10, 2026

View reviewed changes

metalurgical added 4 commits June 11, 2026 12:00

update: address review comment

ed7d99b

Merge branch 'main' into block_cache_fix

4974291

Merge branch 'main' into block_cache_fix

7eba39f

Merge branch 'main' into block_cache_fix

0918da7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(recent-block-cache): Reduce fetch fan out and preserve negative cache entries.#4421

fix(recent-block-cache): Reduce fetch fan out and preserve negative cache entries.#4421
metalurgical wants to merge 15 commits into
cowprotocol:mainfrom
metalurgical:block_cache_fix

metalurgical commented May 17, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

metalurgical May 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

jmg-duarte Jun 9, 2026

Uh oh!

metalurgical Jun 9, 2026

Uh oh!

MartinquaXD Jun 10, 2026

Uh oh!

metalurgical Jun 11, 2026

Uh oh!

MartinquaXD Jun 10, 2026

Uh oh!

metalurgical Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

metalurgical commented May 17, 2026

Description

Changes

How to test

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

metalurgical May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

jmg-duarte Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

metalurgical Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

MartinquaXD Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

metalurgical Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

MartinquaXD Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

metalurgical Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants